Nucleic acid molecules encoding mismatch endonucleases and methods of use thereof

ABSTRACT

We describe here an in vitro method of increasing complementarity in a heteroduplex polynucleotide sequence. The method uses annealing of opposite strands to form a polynucleotide duplex with mismatches. The heteroduplex polynucleotide is combined with an effective amount of enzymes having strand cleavage activity, 3′ to 5′ exonuclease activity, and polymerase activity, and allowing sufficient time for the percentage of complementarity to be increased within the heteroduplex. Not all heteroduplex polynucleotides will necessarily have all mismatches resolved to complementarity. The resulting polynucleotide is optionally ligated. Several variant polynucleotides result. At sites where either of the opposite strands has templated recoding in the other strand, the resulting percent complementarity of the heteroduplex polynucleotide sequence is increased. Also described are mismatch endonucleases suitable for use in the process.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is a Continuation-in-part of U.S. patent applicationSer. No. 11/417,448, filed May 3, 2006, now U.S. Pat. No. 7,273,739,which is a Continuation of U.S. patent application Ser. No. 10/211,079,filed Aug. 1, 2002, now U.S. Pat. No. 7,078,211, which is aContinuation-in-part of U.S. patent application Ser. No. 10/098,155,filed Mar. 14, 2002, now abandoned, which claims priority from U.S.Provisional Application No. 60/353,722, filed Feb. 1, 2002. These priorapplications are hereby incorporated by reference in their entirety.

BACKGROUND OF THE INVENTION

The invention relates generally to molecular biology and morespecifically to methods of generating populations of related nucleicacid molecules.

DNA shuffling is a powerful tool for obtaining recombinants between twoor more DNA sequences to evolve them in an accelerated manner. Theparental, or input, DNAs for the process of DNA shuffling are typicallymutants or variants of a given gene that have some improved characterover the wild-type. The products of DNA shuffling represent a pool ofessentially random reassortments of gene sequences from the parentalDNAs that can then be analyzed for additive or synergistic effectsresulting from new sequence combinations.

Recursive sequence reassortment is analogous to an evolutionary processwhere only variants with suitable properties are allowed to contributetheir genetic material to the production of the next generation.Optimized variants are generated through DNA shuffling-mediated sequencereassortment followed by testing for incremental improvements inperformance. Additional cycles of reassortment and testing lead to thegeneration of genes that contain new combinations of the geneticimprovements identified in previous rounds of the process. Reassortingand combining beneficial genetic changes allows an optimized sequence toarise without having to individually generate and screen all possiblesequence combinations.

This differs sharply from random mutagenesis, where subsequentimprovements to an already improved sequence result largely fromserendipity. For example, in order to obtain a protein that has adesired set of enhanced properties, it may be necessary to identify amutant that contains a combination of various beneficial mutations. Ifno process is available for combining these beneficial genetic changes,further random mutagenesis will be required. However, random mutagenesisrequires repeated cycles of generating and screening large numbers ofmutants, resulting in a process that is tedious and highly laborintensive. Moreover, the rate at which sequences incur mutations withundesirable effects increases with the information content of asequence. Hence, as the information content, library size, andmutagenesis rate increase, the ratio of deleterious mutations tobeneficial mutations will increase, increasingly masking the selectionof further improvements. Lastly, some computer simulations havesuggested that point mutagenesis alone may often be too gradual to allowthe large-scale block changes that are required for continued anddramatic sequence evolution.

There are a number of different techniques used for random mutagenesis.For example, one method utilizes error-prone polymerase chain reaction(PCR) for creating mutant genes in a library format, (Cadwell and Joyce,1992; Gram et al., 1992). Another method is cassette mutagenesis (Arkinand Youvan, 1992; Delagrave et al., 1993; Delagrave and Youvan, 1993;Goldman and Youvan, 1992; Hermes et al., 1990; Oliphant et al., 1986;Stemmer et al., 1993) in which the specific region to be optimized isreplaced with a synthetically mutagenized oligonucleotide.

Error-prone PCR uses low-fidelity polymerization conditions to introducea low level of point mutations randomly over a sequence. A limitation tothis method, however, is that published error-prone PCR protocols sufferfrom a low processivity of the polymerase, making this approachinefficient at producing random mutagenesis in an average-sized gene.

In oligonucleotide-directed random mutagenesis, a short sequence isreplaced with a synthetically mutagenized oligonucleotide. To generatecombinations of distant mutations, different sites must be addressedsimultaneously by different oligonucleotides. The limited library sizethat is obtained in this way, relative to the library size required tosaturate all sites, means that many rounds of selection are required foroptimization. Mutagenesis with synthetic oligonucleotides requiressequencing of individual clones after each selection round followed bygrouping them into families, arbitrarily choosing a single family, andreducing it to a consensus motif. Such a motif is resynthesized andreinserted into a single gene followed by additional selection. Thisstep creates a statistical bottleneck, is labor intensive, and is notpractical for many rounds of mutagenesis.

For these reasons, error-prone PCR and oligonucleotide-directedmutagenesis can be used for mutagenesis protocols that requirerelatively few cycles of sequence alteration, such as for sequencefine-tuning, but are limited in their usefulness for proceduresrequiring numerous mutagenesis and selection cycles, especially on largegene sequences.

As discussed above, prior methods for producing improved gene productsfrom randomly mutated genes are of limited utility. One recognizedmethod for producing a wide variety of randomly reasserted genesequences uses enzymes to cleave a long nucleotide chain into shorterpieces. The cleaving agents are then separated from the geneticmaterial, and the material is amplified in such a manner that thegenetic material is allowed to reassemble as chains of polynucleotides,where their reassembly is either random or according to a specificorder. ((Stemmer, 1994a; Stemmer, 1994b), U.S. Pat. No. 5,605,793, U.S.Pat. No. 5,811,238, U.S. Pat. No. 5,830,721, U.S. Pat. No. 5,928,905,U.S. Pat. No. 6,096,548, U.S. Pat. No. 6,117,679, U.S. Pat. No.6,165,793, U.S. Pat. No. 6,153,410). A variation of this method usesprimers and limited polymerase extensions to generate the fragmentsprior to reassembly (U.S. Pat. No. 5,965,408, U.S. Pat. No. 6,159,687).

However, both methods have limitations. These methods suffer from beingtechnically complex. This limits the applicability of these methods tofacilities that have sufficiently experienced staffs. In addition thereare complications that arise from the reassembly of molecules fromfragments, including unintended mutagenesis and the increasingdifficulty of the reassembly of large target molecules of increasingsize, which limits the utility of these methods for reassembling longpolynucleotide strands.

Another limitation of these methods of fragmentation andreassembly-based gene shuffling is encountered when the parentaltemplate polynucleotides are increasingly heterogeneous. In theannealing step of those processes, the small polynucleotide fragmentsdepend upon stabilizing forces that result from base-pairinginteractions to anneal properly. As the small regions of annealing havelimited stabilizing forces due to their short length, annealing ofhighly complementary sequences is favored over more divergent sequences.In such instances these methods have a strong tendency to regenerate theparental template polynucleotides due to annealing of complementarysingle-strands from a particular parental template. Therefore, theparental templates essentially reassemble themselves creating abackground of unchanged polynucleotides in the library that increasesthe difficulty of detecting recombinant molecules. This problem becomesincreasingly severe as the parental templates become more heterogeneous,that is, as the percentage of sequence identity between the parentaltemplates decreases. This outcome was demonstrated by Kikuchi, et al.,(Gene 243:133-137, 2000) who attempted to generate recombinants betweenxylE and nahH using the methods of family shuffling reported by Fattenet al., 1997; Crameri et al., 1998; Harayama, 1998; Kumamaru et al.,1998; Chang et al., 1999; Hansson et al., 1999). Kikuchi, et al., foundthat essentially no recombinants (<1%) were generated. They alsodisclosed a method to improve the formation of chimeric genes byfragmentation and reassembly of single-stranded DNAs. Using this method,they obtained chimeric genes at a rate of 14 percent, with the other 86percent being parental sequences.

The characteristic of low-efficiency recovery of recombinants limits theutility of these methods for generating novel polynucleotides fromparental templates with a lower percentage of sequence identity, thatis, parental templates that are more diverse. Accordingly, there is aneed for a method of generating gene sequences that addresses theseneeds.

The present invention provides a method that satisfies theaforementioned needs, and also provides related advantages as well.

BRIEF SUMMARY OF THE INVENTION

The present invention provides a method for reasserting mutations amongrelated polynucleotides, in vitro, by forming heteroduplex molecules andthen addressing the mismatches such that sequence information at sitesof mismatch is transferred from one strand to the other. In onepreferred embodiment, the mismatches are addressed by incubating theheteroduplex molecules in a reaction containing a mismatch nickingenzyme, a polymerase with a 3′ to 5′ proofreading activity in thepresence of dNTPs, and a ligase. These respective activities act inconcert such that, at a given site of mismatch, the heteroduplex isnicked, unpaired bases are excised then replaced using the oppositestrand as a template, and nicks are sealed. Output polynucleotides areamplified before cloning, or cloned directly and tested for improvedproperties. Additional cycles of mismatch resolution reassortment andtesting lead to further improvement. Also described are mismatchendonucleases suitable for use in the process.

BRIEF DESCRIPTION OF THE FIGURES

FIG. 1 depicts the process of Genetic ReAssortment by MismatchResolution (GRAMMR). Reassortment is contemplated between twohypothetical polynucleotides differing at least two nucleotidepositions. Annealing between the top strand of A (SEQ ID NO:18;5′-AGATCGATCAATTG-3′) and the bottom strand of B (fully complementarystrand of SEQ ID NO:19; SEQ ID NO:19 is 5′-AGACCGATCGATTG-3′) is shown(labeled HETERODUPLEX) which results in mismatches at the two positions.After the process of reassortment by mismatch resolution, four distinctproduct polynucleotides are seen, the parental types A (SEQ ID NO:18 andits fully complementary strand) and B (SEQ ID NO:19 and its fullycomplementary strand), and the reasserted products C (SEQ ID NO:20;5′-AGATCGATCGATTG-3′ and its fully complementary strand) and D (SEQ IDNO:21; 5′-AGACCGATCAATTG-3′ and its fully complementary strand).

FIG. 2 depicts an exemplary partially complementary nucleic acidpopulation of two molecules. FIG. 2A shows the sequence of two nucleicacid molecules “X” (SEQ ID NO:16; 5′-AGATCAATTG-3′ and its fullycomplementary strand) and “Y” (SEQ ID NO:17; 5′-AGACCGATTG-3′ and itsfully complementary strand) having completely complementary top/bottomstrands 1+/2− and 3+/4−, respectively. The positions of differingnucleotides between the nucleic acids X and Y are indicated (*). FIG. 2Bshows possible combinations of single strands derived from nucleic acidsX and Y after denaturing and annealing and indicates which of thosecombinations would comprise a partially complementary nucleic acidpopulation of two.

DEFINITIONS

As used herein the term “amplification” refers to a process where thenumber of copies of a polynucleotide is increased.

As used herein, “annealing” refers to the formation of at leastpartially double stranded nucleic acid by hybridization of at leastpartially complementary nucleotide sequences. A partially doublestranded nucleic acid can be due to the hybridization of a smallernucleic acid strand to a longer nucleic acid strand, where the smallernucleic acid is 100% identical to a portion of the larger nucleic acid.A partially double stranded nucleic acid can also be due to thehybridization of two nucleic acid strands that do not share 100%identity but have sufficient homology to hybridize under a particularset of hybridization conditions.

As used herein, “clamp” refers to a unique nucleotide sequence added toone end of a polynucleotide, such as by incorporation of the clampsequence into a PCR primer. The clamp sequences are intended to allowamplification only of polynucleotides that arise from hybridization ofstrands from different parents (i.e., heteroduplex molecules) therebyensuring the production of full-length hybrid products as describedpreviously (Skarfstad, J. Bact, vol 182, No 11, P. 3008-3016).

As used herein the term “cleaving” means digesting the polynucleotidewith enzymes or otherwise breaking phosphodiester bonds within thepolynucleotide.

As used herein the term “complementary basepair” refers to thecorrespondence of DNA (or RNA) bases in the double helix such thatadenine in one strand is opposite thymine (or uracil) in the otherstrand and cytosine in one strand is opposite guanine in the other.

As used herein the term “complementary to” is used herein to mean thatthe complementary sequence is identical to the reverse-complement of allor a portion of a reference polynucleotide sequence or that eachnucleotide in one strand is able to form a base-pair with a nucleotide,or analog thereof in the opposite strand. For illustration, thenucleotide sequence “TATAC” is complementary to a reference sequence“GTATA”.

As used herein, “denaturing” or “denatured,” when used in reference tonucleic acids, refers to the conversion of a double stranded nucleicacid to a single stranded nucleic acid. Methods of denaturing doublestranded nucleic acids are well known to those skilled in the art, andinclude, for example, addition of agents that destabilize base-pairing,increasing temperature, decreasing salt, or combinations thereof. Thesefactors are applied according to the complementarity of the strands,that is, whether the strands are 100% complementary or have one or morenon-complementary nucleotides.

As used herein the term “desired functional property” means a phenotypicproperty, which include but are not limited to, encoding a polypeptide,promoting transcription of linked polynucleotides, binding a protein,improving the function of a viral vector, and the like, which can beselected or screened for. Polynucleotides with such desired functionalproperties, can be used in a number of ways, which include but are notlimited to expression from a suitable plant, animal, fungal, yeast, orbacterial expression vector, integration to form a transgenic plant,animal or microorganism, expression of a ribozyme, and the like.

As used herein the term “DNA shuffling” is used herein to indicaterecombination between substantially homologous but non-identicalsequences.

As used herein, the term “effective amount” refers to the amount of anagent necessary for the agent to provide its desired activity. For thepresent invention, this determination is well within the knowledge ofthose of ordinary skill in the art.

As used herein the term “exonuclease” refers to an enzyme that cleavesnucleotides one at a time from an end of a polynucleotide chain, thatis, an enzyme that hydrolyzes phosphodiester bonds from either the 3′ or5′ terminus of a polynucleotide molecule. Such exonucleases, include butare not limited to T4 DNA polymerase, T7 DNA polymerase, E. coli Pol 1,and Pfu DNA polymerase. The term “exonuclease activity” refers to theactivity associated with an exonuclease. An exonuclease that hydrolyzesin a 3′ to 5′ direction is said to have “3′ to 5′ exonuclease activity.”Similarly an exonuclease with 5′ to 3′ activity is said to have “5′ to3′ exonuclease activity.” It is noted that some exonucleases are knownto have both 3′ to 5′, 5′ to 3′ activity, such as, E. coli Pol I.

As used herein, “Genetic Reassortment by Mismatch Resolution (GRAMMR)”refers to a method for reasserting sequence variations among relatedpolynucleotides by forming heteroduplex molecules and then addressingthe mismatches such that information is transferred from one strand tothe other.

As used herein, “granularity” refers to the amount of a nucleic acid'ssequence information that is transferred as a contiguous sequence from atemplate polynucleotide strand to a second polynucleotide strand. Asused herein, “template sequence” refers to a first single strandedpolynucleotide sequence that is partially complementary to a secondpolynucleotide sequence such that treatment by GRAMMR results intransfer of genetic information from the template strand to the secondstrand.

The larger the units of sequence information transferred from a templatestrand, the higher the granularity. The smaller the blocks of sequenceinformation transferred from the template strand, the lower or finer thegranularity. Lower granularity indicates that a DNA shuffling orreassortment method is able to transfer smaller discrete blocks ofgenetic information from the template strand to the second strand. Theadvantage of a DNA shuffling or reassortment method with lowergranularity is that it is able to resolve smaller nucleic acid sequencesfrom others, and to transfer the sequence information. DNA shuffling orreassortment methods that return primarily high granularity are notreadily able to resolve smaller nucleic acid sequences from others.

As used herein the term “heteroduplex polynucleotide” refers to a doublehelix polynucleotide formed by annealing single strands, typicallyseparate strands, where the strands are non-identical. A heteroduplexpolynucleotide may have unpaired regions existing as single strand loopsor bubbles. A heteroduplex polynucleotide region can also be formed byone single-strand polynucleotide wherein partial self-complementarityallows the formation of a stem-loop structure where the annealingportion of the strand is non-identical.

As used herein the term “heteroduplex DNA” refers to a DNA double helixformed by annealing single strands, typically separate strands), wherethe strands are non-identical. A heteroduplex DNA may have unpairedregions existing as single strand loops or bubbles. A heteroduplex DNAregion can also be formed by one single-strand polynucleotide whereinpartial self-complementarity allows the formation of a stem-loopstructure where the annealing portion of the strand is non-identical.

As used herein the term “homologous” means that one single-strandednucleic acid sequence may hybridize to an at least partiallycomplementary single-stranded nucleic acid sequence. The degree ofhybridization may depend on a number of factors including the amount ofidentity between the sequences and the hybridization conditions such astemperature and salt concentrations as discussed later.

Nucleic acids are “homologous” when they are derived, naturally orartificially, from a common ancestor sequence. During natural evolution,this occurs when two or more descendent sequences diverge from a parentsequence over time, i.e., due to mutation and natural selection. Underartificial conditions, divergence occurs, e.g., in one of two basicways. First, a given sequence can be artificially recombined withanother sequence, as occurs, e.g., during typical cloning, to produce adescendent nucleic acid, or a given sequence can be chemically modified,or otherwise manipulated to modify the resulting molecule.Alternatively, a nucleic acid can be synthesized de novo, bysynthesizing a nucleic acid that varies in sequence from a selectedparental nucleic acid sequence. When there is no explicit knowledgeabout the ancestry of two nucleic acids, homology is typically inferredby sequence comparison between two sequences. Where two nucleic acidsequences show sequence similarity over a significant portion of each ofthe nucleic acids, it is inferred that the two nucleic acids share acommon ancestor. The precise level of sequence similarity thatestablishes homology varies in the art depending on a variety offactors.

For purposes of this disclosure, two nucleic acids are consideredhomologous where they share sufficient sequence identity to allowGRAMMR-mediated information transfer to occur between the two nucleicacid molecules.

As used herein the term “identical” or “identity” means that two nucleicacid sequences have the same sequence or a complementary sequence. Thus,“areas of identity” means that regions or areas of a polynucleotide orthe overall polynucleotide are identical or complementary to areas ofanother polynucleotide.

As used herein the term “increase in percent complementarity” means thatthe percentage of complementary base-pairs in a heteroduplex molecule ismade larger.

As used herein the term, “ligase” refers to an enzyme that rejoins abroken phosphodiester bond in a nucleic acid.

As used herein the term “mismatch” refers to a base-pair that is unableto form normal base-pairing interactions (i.e., other than “A” with “T”(or “U”), or “G” with “C”).

As used herein the term “mismatch resolution” refers to the conversionof a mismatched base-pair into a complementary base-pair. The term alsoencompasses the conversion of insertions or deletions in a heteroduplexinto base-paired homoduplex.

As used herein the term “mismatch endonuclease” or mismatch-directedendonuclease” refers to an enzyme that is able to both recognize amismatch in a heteroduplex polynucleotide and cut one strand of theheteroduplex at or within a few bases of the mismatch.

As used herein the term “mutations” means changes in the sequence of awild-type or reference nucleic acid sequence or changes in the sequenceof a polypeptide. Such mutations can be point mutations such astransitions or transversions. The mutations can be deletions, insertionsor duplications.

As used herein the term “nick translation” refers to the property of apolymerase where the combination of a 5′-to-3′ exonuclease activity witha 5′-to-3′ polymerase activity allows the location of a single-strandbreak in a double-stranded polynucleotide (a “nick”) to move in the5′-to-3′ direction.

As used herein, the term “nucleic acid” or “nucleic acid molecule” meansa polynucleotide such as deoxyribonucleic acid (DNA) or ribonucleic acid(RNA) and encompasses single-stranded and double-stranded nucleic acidas well as an oligonucleotide. Nucleic acids useful in the inventioninclude genomic DNA, cDNA, mRNA and synthetic oligonucleotides, and canrepresent the sense strand, the anti-sense strand, or both. A nucleicacid generally incorporates the four naturally occurring nucleotidesadenine, guanine, cytosine, and thymidine/uridine. An invention nucleicacid can also incorporate other naturally occurring or non-naturallyoccurring nucleotides, including derivatives thereof, so long as thenucleotide derivatives can be incorporated into a polynucleotide by apolymerase at an efficiency sufficient to generate a desiredpolynucleotide product.

As used herein, a “parental nucleic acid” refers to a double strandednucleic acid having a sequence that is 100% identical to an originalsingle stranded nucleic acid in a starting population of partiallycomplementary nucleic acids. Parental nucleic acids would include, forexample in the illustration of FIG. 2, nucleic acids X and Y ifpartially complementary nucleic acid combinations 1+/4− or 2−/3+ wereused as a starting population in an invention method.

As used herein, “partially complementary” refers to a nucleic acidhaving a substantially complementary sequence to another nucleic acidbut that differs from the other nucleic acid by at least two or morenucleotides. As used herein, “partially complementary nucleic acidpopulation” refers to a population of nucleic acids comprising nucleicacids having substantially complementary sequences but no nucleic acidshaving an exact complementary sequence for any other member of thepopulation. As used herein, any member of a partially complementarynucleic acid population differs from another nucleic acid of thepopulation, or the complement thereto, by two or more nucleotides. Assuch, a partially complementary nucleic acid specifically excludes apopulation containing sequences that are exactly complementary, that is,a complementary sequence that has 100% complementarity. Therefore, eachmember of such a partially complementary nucleic acid population differsfrom other members of the population by two or more nucleotides,including both strands. One strand is designated the top strand, and itscomplement is designated the bottom strand. As used herein, “top” strandrefers to a polynucleotide read in the 5′ to 3′ direction and the“bottom” its complement. It is understood that, while a sequence isreferred to as bottom or top strand, such a designation is intended todistinguish complementary strands since, in solution, there is noorientation that fixes a strand as a top or bottom strand.

For example, a population containing two nucleic acid members can bederived from two double stranded nucleic acids, with a potential ofusing any of the four strands to generate a single stranded partiallycomplementary nucleic acid population. An example of potentialcombinations of strands of two nucleic acids that can be used to obtaina partially complementary nucleic acid population of the invention isshown in FIG. 2. The two nucleic acid sequences that are potentialmembers of a partially complementary nucleic acid population aredesignated “X” and “Y” (FIG. 2A). The nucleic acid sequences differ attwo positions (positions 4 and 6 indicated by “*”). The “top” strand ofnucleic acids X and Y are designated “1+” and “3+,” respectively, andthe “bottom” strand of nucleic acids X and Y are designated “2−” and“4−,” respectively.

FIG. 2B shows the possible combinations of the four nucleic acidstrands. Of the six possible strand combinations, only the combinationof 1+/2−, 1+/4−, 2−/3+, or 3+/4− comprise the required top and bottomstrand of a partially complementary nucleic acid population. Of thesetop/bottom sequence combinations, only 1+/4− or 2 −/3+ comprise anexample of a partially complementary nucleic acid population of twodifferent molecules because only these combinations have complementarysequences that differ by at least one nucleotide. The remainingcombinations, 1+/2− and 2+/4−, contain exactly complementary sequencesand therefore do not comprise a partially complementary nucleic acidpopulation of the invention.

In the above-described example of a population of two differentmolecules, a partially complementary population of nucleic acidmolecules excluded combinations of strands that differ by one or morenucleotides but which are the same sense, for example, 1+/3+ or 2−/4−.However, it is understood that such a combination of same strandednucleic acids can be included in a larger population, so long as thepopulation contains at least one bottom strand and at least one topstrand. For example, if a third nucleic acid “Z,” with strands 5+ and 6−is included, the combinations 1+/3+/6− or 2−/4−/5+ would comprise apartially complementary nucleic acid population. Similarly, any numberof nucleic acids and their corresponding top and bottom strands can becombined to generate a partially complementary nucleic acid populationof the invention so long as the population contains at least one topstrand and at least one bottom strand and so long as the populationcontains no members that are the exact complement.

The populations of nucleic acids of the invention can be about 3 ormore, about 4 or more, about 5 or more, about 6 or more, about 7 ormore, about 8 or more, about 9 or more, about 10 or more, about 12 ormore, about 15 or more, about 20 or more, about 25 or more about 30 ormore, about 40 or more, about 50 or more, about 75 or more, about 100 ormore, about 150 or more, about 200 or more, about 250 or more, about 300or more, about 350 or more, about 400 or more, about 450 or more, about500 or more, or even about 1000 or more different nucleic acidmolecules. A population can also contain about 2000 or more, about 5000or more, about 1×10⁴ or more, about 1×10⁵ or more, about 1×10⁶ or more,about 1×10⁷ or more, or even about 1×10⁸ or more different nucleicacids. One skilled in the art can readily determine a desirablepopulation to include in invention methods depending on the nature ofthe desired reassortment experiment outcome and the available screeningmethods, as disclosed herein.

As used herein, a “polymerase” refers to an enzyme that catalyzes theformation of polymers of nucleotides, that is, polynucleotides. Apolymerase useful in the invention can be derived from any organism orsource, including animal, plant, bacterial and viral polymerases. Apolymerase can be a DNA polymerase, RNA polymerase, or a reversetranscriptase capable of transcribing RNA into DNA.

As used herein the term “proofreading” describes the property of anenzyme where a nucleotide, such as, a mismatch nucleotide, can beremoved by a 3′-to-5′ exonuclease activity and replaced by, typically, abase-paired nucleotide.

As used herein, a “recombinant” polynucleotide refers to apolynucleotide that comprises sequence information from at least twodifferent polynucleotides.

As used herein the term “related polynucleotides” means that regions orareas of the polynucleotides are identical and regions or areas of thepolynucleotides are non-identical.

As used herein the term DNA “reassortment” is used herein to indicate aredistribution of sequence variations between substantially homologousbut non-identical sequences.

As used herein the term “replicon” refers to a genetic unit ofreplication including a length of polynucleotide and its site forinitiation of replication.

As used herein the term “sequence diversity” refers to the abundance ofnon-identical polynucleotides. The term “increasing sequence diversityin a population” means to increase the abundance of non-identicalpolynucleotides in a population.

As used herein the term “sequence variant” is used herein refers to amolecule (DNA, RNA polypeptide, and the like) with one or more sequencedifferences compared to a reference molecule. For example, the sum ofthe separate independent mismatch resolution events that occurthroughout the heteroduplex molecule during the GRAMMR process resultsin reassortment of sequence information throughout that molecule. Thesequence information will reassort in a variety of combinations togenerate a complex library of “sequence variants”.

As used herein the term “strand cleavage activity” or “cleavage” refersto the breaking of a phosphodiester bond in the backbone of thepolynucleotide strand, as in forming a nick. Strand cleavage activitycan be provided by an enzymatic agent, such agents include, but are notlimited to CEL I, RES I, T4 endonuclease VII, T7 endonuclease I, S1nuclease, BAL-31 nuclease, FEN1, cleavase, pancreatic DNase I, SPnuclease, mung bean nuclease, and nuclease P1; by a chemical agent, suchagents include, but are not limited to potassium permanganate,tetraethylammonium acetate, sterically bulky photoactivatable DNAintercalators, [Rh(bpy)2(chrysi)]3+, osmium tetroxide with piperidine,and hydroxylamine with piperidine; or by energy in the form of ionizingradiation, or kinetic radiation.

As used herein the term “sufficient time” refers to the period timenecessary for a reaction or process to render a desired product. For thepresent invention, the determination of sufficient time is well withinthe knowledge of those of ordinary skill in the art. It is noted that“sufficient time” can vary widely, depending on the desires of thepractitioner, without impacting on the functionality of the reaction, orthe quality of the desired product.

As used herein the term “wild-type” means that a nucleic acid fragmentdoes not contain any mutations. A “wild-type” protein means that theprotein will be active at a level of activity found in nature andtypically will be the amino acid sequence found in nature. In an aspect,the term “wild type” or “parental sequence” can indicate a starting orreference sequence prior to a manipulation of the invention.

In the polypeptide notation used herein, the left-hand direction is theamino terminal direction and the right-hand direction is thecarboxy-terminal direction, in accordance with standard usage andconvention. Similarly, unless specified otherwise, the left-hand end ofsingle-stranded polynucleotide sequences is the 5′ end; the left-handdirection of double-stranded polynucleotide sequences is referred to asthe 5′ direction. The direction of 5′ to 3′ addition of nascent RNAtranscripts is referred to as the transcription direction.

DETAILED DESCRIPTION OF THE INVENTION

The present invention provides an in vitro method of making sequencevariants from at least one heteroduplex polynucleotide wherein theheteroduplex has at least two non-complementary nucleotide base pairs,the method comprising: preparing at least one heteroduplexpolynucleotide; combining said heteroduplex polynucleotide with aneffective amount of an agent or agents with exonuclease activity,polymerase activity and strand cleavage activity; and allowingsufficient time for the percentage of complementarity to increase,wherein at least one or more variants are made.

Another aspect of the present invention is where the heteroduplexpolynucleotides are circular, linear or a replicon.

Another aspect of the present invention is where the desired variantshave different amounts of complementarity.

Another aspect of the present invention is where the exonucleaseactivity, polymerase activity, and strand cleavage activity is addedsequentially, or concurrently.

Another aspect of the present invention provides the addition of ligaseactivity, provided by agents such as, T4 DNA ligase, E. coli DNA ligase,or Taq DNA ligase.

Another aspect of the present invention is where the strand cleavageactivity is provided by an enzyme, such as, CEL I, RES I, T4endonuclease VII, T7 endonuclease I, S1 nuclease, BAL-31 nuclease, FEN1,cleavase, pancreatic DNase I, SP nuclease, mung bean nuclease, andnuclease P1; a chemical agent, such as, potassium permanganate,tetraethylammonium acetate, sterically bulky photoactivatable DNAintercalators, [Rh(bpy)2(chrysi)]3+, osmium tetroxide with piperidine,and hydroxylamine with piperidine or a form of energy, such as, ionizingor kinetic radiation.

Another aspect of the present invention is where polymerase activity isprovided by Pol beta.

Another aspect of the present invention is where both polymeraseactivity and 3′ to 5′ exonuclease activity is provided T4 DNApolymerase, T7 DNA polymerase, E. coli Pol 1, or Pfu DNA polymerase.

Another aspect of the present invention is where the agent with bothpolymerase activity and 5′ to 3′ exonuclease activity is E. coli Pol 1.

An embodiment of the present invention is where the effective amount ofstrand cleavage activity, and exonuclease activity/polymerase activityand ligase activity are provided by RES I, T4 DNA polymerase, and T4 DNAligase.

Another aspect of the present invention is where the effective amount ofstrand cleavage activity, and exonuclease activity/polymerase activityand ligase activity are provided by RES I, T7 DNA polymerase, and T4 DNAligase.

Another embodiment of the present invention provides an in vitro methodof increasing diversity in a population of sequences, comprising,preparing at least one heteroduplex polynucleotide; combining theheteroduplex polynucleotide with an effective amount of an agent oragents with 3′ to 5′ exonuclease activity, polymerase activity andstrand cleavage activity; and allowing sufficient time for thepercentage of complementarity to increase, wherein diversity in thepopulation is increased.

Another embodiment of the present invention provides a method ofobtaining a polynucleotide encoding a desired functional property,comprising: preparing at least one heteroduplex polynucleotide;combining said heteroduplex polynucleotide with an effective amount ofan agent or agents with exonuclease activity, polymerase activity andstrand cleavage activity; allowing sufficient time for the percentage ofcomplementarity between strands of the heteroduplex polynucleotide toincrease, wherein diversity in the population is increased; andscreening or selecting a population of variants for the desiredfunctional property.

Another embodiment of the present invention provides a method ofobtaining a polynucleotide encoding a desired functional property,comprising: preparing at least one heteroduplex polynucleotide;combining said heteroduplex polynucleotide with an effective amount ofan agent or agents with exonuclease activity, polymerase activity andstrand cleavage activity; allowing sufficient time for the percentage ofcomplementarity between strands of the heteroduplex polynucleotide toincrease, wherein diversity in the population is increased; convertingDNA to RNA; and screening or selecting a population of ribonucleic acidvariants for the desired functional property.

Yet another embodiment of the present invention provides a method ofobtaining a polypeptide having a desired functional property,comprising: preparing at least one heteroduplex polynucleotide;combining said heteroduplex polynucleotide with an effective amount ofan agent or agents with exonuclease activity, polymerase activity andstrand cleavage activity; allowing sufficient time for the percentage ofcomplementarity between strands of said heteroduplex polynucleotide toincrease, converting said heteroduplex polynucleotide to RNA, and saidRNA to a polypeptide; and screening or selecting a population ofpolypeptide variants for said desired functional property.

Still another embodiment of the present invention provides a method ofobtaining a polynucleotide encoding a desired functional property,comprising: preparing at least one heteroduplex polynucleotide, wherethe heteroduplex is optionally, about 95%, 90%, 85%, 80%, or 75%identical, and about 1000 KB, 10,000 KB, or 100,000 KB is size;combining said heteroduplex polynucleotide with an effective amount ofan agent or agents with exonuclease activity, polymerase activity andstrand cleavage activity; allowing sufficient time for the percentage ofcomplementarity between strands of the heteroduplex polynucleotide toincrease, screening or selecting for a population of variants having adesired functional property; denaturing said population of variants toobtain single strand polynucleotides; annealing said single strandpolynucleotides to form at least one second heteroduplex polynucleotide;combining said second heteroduplex polynucleotide with an effectiveamount of an agent or agents with exonuclease activity, polymeraseactivity and strand cleavage activity; and allowing sufficient time forthe percentage of complementarity between strands of the heteroduplexpolynucleotide to increase.

The present invention is directed to a method for generating an improvedpolynucleotide sequence or a population of improved polynucleotidesequences, typically in the form of amplified and/or clonedpolynucleotides, whereby the improved polynucleotide sequence(s) possessat least one desired phenotypic characteristic (e.g., encodes apolypeptide, promotes transcription of linked polynucleotides, binds aprotein, improves the function of a viral vector, and the like) whichcan be selected or screened for. Such desired polynucleotides can beused in a number of ways such as expression from a suitable plant,animal, fungal, yeast, or bacterial expression vector, integration toform a transgenic plant, animal or microorganism, expression of aribozyme, and the like.

GRAMMR provides for a process where heteroduplexed DNA strands arecreated by annealing followed by resolution of mismatches in an in vitroreaction. This reaction begins with cleavage of one strand or the otherat or near a mismatch followed by excision of mismatched bases from thatstrand and polymerization to fill in the resulting gap with nucleotidesthat are templated to the sequence of the other strand. The resultingnick can be sealed by ligation to rejoin the backbone. The sum of theseparate independent mismatch resolution events that occur throughoutthe heteroduplex molecule will result in reassortment of sequenceinformation throughout that molecule. The sequence information willreassort in a variety of combinations to generate a complex library ofsequence variants.

In one embodiment of GRAMMR, a library of mutants is generated by anymethod known in the art such as mutagenic PCR, chemical mutagenesis,etc. followed by screening or selection for mutants with a desiredproperty. DNA is prepared from the chosen mutants. The DNAs of themutants are mixed, denatured to single strands, and allowed to anneal.Partially complementary strands that hybridize will have non-base-pairednucleotides at the sites of the mismatches. Treatment with CEL I(Oleykowski et al., 1998; Yang et al., 2000), or a similarmismatch-directed activity, such as RES I, will cause nicking of one orthe other polynucleotide strand 3′ of each mismatch. (In addition, CEL Ior RES I can nick 3′ of an insertion/deletion resulting in reassortmentof insertions/deletions.) The presence of a polymerase containing a3′-to-5′ exonuclease (“proofreading”) activity (e.g., T4 DNA Pol) willallow excision of the mismatch, and subsequent 5′-to-3′ polymeraseactivity will fill in the gap using the other strand as a template. Apolymerase that lacks 5′-3′ exonuclease activity and strand-displacementactivity will fill in the gap and will cease to polymerize when itreaches the 5′ end of DNA located at the original CEL I cleavage site,thus re-synthesizing only short patches of sequence. Alternatively, thelength of the synthesized patches can be modulated by spiking thereaction with a polymerase that contains a 5′-3′ exonuclease activity;this nick-translation activity can traverse a longer region resulting ina longer patch of information transferred from the template strand. DNAligase (e.g., T4 DNA ligase) can then seal the nick by restoring thephosphate backbone of the repaired strand. This process can occursimultaneously at many sites and on either strand of a givenheteroduplexed DNA molecule. The result is a randomization of sequencedifferences among input strands to give a population of sequencevariants that is more diverse than the population of starting sequences.These output polynucleotides can be cloned directly into a suitablevector, or they can be amplified by PCR before cloning. Alternatively,the reaction can be carried out on heteroduplexed regions within thecontext of a double-stranded circular plasmid molecule or other suitablereplicon that can be directly introduced into the appropriate hostfollowing the GRAMMR reaction.

In another alternative, the output polynucleotides can be transcribedinto RNA polynucleotides and used directly, for example, by inoculationof a plant viral vector onto a plant, such as in the instance of a viralvector transcription plasmid. The resulting clones are subjected to aselection or a screen for improvements in a desired property. Theoverall process can then be repeated one or more times with the selectedclones in an attempt to obtain additional improvements.

If the output polynucleotides are cloned directly, there is thepossibility of incompletely resolved molecules persisting that, uponreplication in the cloning host, could lead to two different plasmids inthe same cell. These plasmids could potentially give rise tomixed-plasmid colonies. If it is desired to avoid such a possibility,the output polynucleotide molecules can be grown in the host to allowreplication/resolution, the polynucleotides isolated and retransformedinto new host cells.

In another embodiment, when sequence input from more than two parentsper molecule is desired, the above procedure is performed in a cyclicmanner before any cloning of output polynucleotides. After GRAMMRtreatment, the double stranded polynucleotides are denatured, allowed toanneal, and the mismatch resolution process is repeated. After a desirednumber of such cycles, the output polynucleotides can be cloneddirectly, introduced into a suitable vector, or they can be amplified byPCR before cloning. The resulting clones are subjected to a selection ora screen for improvements in a desired property.

In another embodiment, a “molecular backcross” is performed to helpeliminate the background of deleterious mutations from the desiredmutations. A pool of desired mutants' DNA can be mixed with anappropriate ratio of wild-type DNA to perform the method. Clones can beselected for improvement, pooled, and crossed back to wild-type againuntil there is no further significant change.

The efficiency of the process is improved by various methods ofenriching the starting population for heteroduplex molecules, thusreducing the number of unaltered parental-type output molecules. Themismatched hybrids can be affinity purified using aptamers, dyes, orother agents that bind to mismatched DNA. A preferred embodiment is theuse of MutS protein affinity matrix (Wagner et al., Nucleic Acids Res.23(19):3944-3948 (1995); Su et al., Proc. Natl. Acad. Sci. (U.S.A.),83:5057-5061 (1986)) or mismatch-binding but non-cleaving mutants ofphage T4 endonuclease VII (Golz and Kemper, Nucleic Acids Research,1999; 27: e7).

In one embodiment, the procedure is modified so that the inputpolynucleotides consist of a single strand of each sequence variant. Forexample, single-stranded DNAs of opposite strandedness are produced fromthe different parent sequences by asymmetric PCR to generate partiallycomplementary single-stranded molecules. Annealing of the strands withone-another to make heteroduplex is performed as described in Example 1.Alternatively, single-stranded DNAs can be generated by preferentiallydigesting one strand of each parental double-stranded DNA with Lambdaexonuclease followed by annealing the remaining strands to one-another.In this embodiment, the annealing strands have no 100% complementarystrand present with which to re-anneal. Hence, there is a lowerbackground of unmodified polynucleotides, that is, “parentalpolynucleotides” among the output polynucleotides leading to a higherefficiency of reasserting sequence variations. This increased efficiencywill be particularly valuable in situations where a screen rather than aselection is employed to test for the desired polynucleotides.

Another method for heteroduplex formation is to mix the double-strandedparent DNAs, denature to dissociate the strands, and allow thesingle-stranded DNAs to anneal to one-another to generate a populationof heteroduplexes and parental homoduplexes. The heteroduplexes can thenbe selectively enriched by a heteroduplex capture method such as thosedescribed above using MutS or a non-cleaving T4 endonuclease VII mutant.Alternatively, the parental homoduplex molecules in the population maybe cleaved by restriction enzymes that overlap with sites of mismatchsuch that they are not cleaved in the heteroduplex but are cleaved inthe parental homoduplex molecules. Uncleaved heteroduplex DNA can thenbe isolated by size fractionation in an agarose gel as was performed togenerate full-length plasmid on full-length plasmid heteroduplex DNAmolecules as describe in Example 6. Circularization of those full-lengthheteroduplexed plasmid molecules was then brought about by incubationwith DNA ligase.

In another embodiment, the parental, or input, double-strandedpolynucleotides are modified by the addition of “clamp” sequences. Oneinput polynucleotide or pool of polynucleotides is amplified by PCR withthe addition of a unique sequence in the 5′ primer. The other inputpolynucleotide or pool is amplified by PCR with the addition of a uniquesequence in the 3′ primer. The clamp sequences can be designed tocontain a unique restriction enzyme site for the 5′ end of the gene ofinterest and another for the 3′ end such that, at the step of cloningthe products of the GRAMMR reassortment, only products with the 5′ clampfrom the first polynucleotide (or pool) and the 3′ end from the secondpolynucleotide (or pool) will have appropriate ends for cloning.Alternatively, the products of GRAMMR reassortment can be PCR amplifiedusing the unique sequences of the 5′ and 3′ clamps to achieve a similarresult. Hence, there is a lower background of unmodifiedpolynucleotides, that is, “parental polynucleotides” among the outputpolynucleotide clones leading to a higher efficiency of reassertingsequence variations. This increased efficiency will be particularlyvaluable in situations where a screen rather than a selection isemployed to test for the desired polynucleotides. Optionally,oligonucleotide primers can be added to the GRAMMR reaction that arecomplementary to the clamp primer sequences such that either parent canserve as the top strand, thus permitting both reciprocal heteroduplexesto participate in the mismatch-resolution reaction.

Another method for generating cyclic heteroduplexed polynucleotides isperformed where parental double-stranded DNAs have terminal clampsequences as described above where the single-stranded clamp sequencesextending from one end of the heteroduplex are complementary tosingle-stranded clamp sequences extending from the other end of theheteroduplex. These complementary, single-stranded clamps are allowed toanneal, thereby circularizing the heteroduplexed DNA molecule. Parentalhomoduplexes that result from re-annealing of identical sequences haveonly one clamp sequence and therefore, no complementary single-strandedsequences at their termini with which circularization can occur.Additionally, a DNA polymerase and a DNA ligase can be used to fill-inany gaps in the circular molecules and to seal the nicks in thebackbone, respectively, to result in the formation of a population ofcovalently-closed circular heteroduplex molecules. As thecovalently-closed circular heteroduplex molecules will not dissociateinto their component strands if subjected to further denaturatingconditions, the process of denaturation, circularization, and ligationcan be repeated to convert more of the linear double-stranded parentalduplexes into closed into closed circular heteroduplexes.

In another embodiment, a region of a single-stranded circular phagemidDNA can be hybridized to a related, but non-identical linear DNA, whichcan then be extended with a polymerase such as T7 DNA polymerase or T4DNA polymerase plus T4 gene 32 protein, then ligated at the resultingnick to obtain a circular, double-stranded molecule with heteroduplexedregions at the sites of differences between the DNAs. GRAMMR can then becarried out on this molecule to obtain a library of sequence-reassortedmolecules.

Alternately, two single-stranded circular phagemid DNAs of oppositestrand polarity relative to the plasmid backbone, and parent genesequences that are the target of the reassortment are annealed to oneand other. A region of extensive mismatch will occur where the phage f1origin sequences reside. Upon GRAMMR treatment, however, this region ofextensive mismatch can revert to either parental type sequence restoringa function f1 origin. These double strained molecules will also containmismatch regions at the sites of differences between the strandsencoding the parent genes of interest. GRAMMR can then be carried out onthis molecule to obtain a library of sequence re-assorted molecule.

As discussed in the preceding paragraphs, the starting DNA or input DNAcan be of any number of forms. For example, input DNA can befull-length, single stranded and of opposite sense, as is taught inExample 1. Alternatively, the input DNA can also be a fragment of thefull-length strand. The input DNAs can be double-stranded, either one orboth, or modified, such as by, methylation, phosphorothiolate linkages,peptide-nucleic acid, substitution of RNA in one or both strands, or thelike. Either strand of a duplex can be continuous along both strands,discontinuous but contiguous, discontinuous-with overlaps, ordiscontinuous with gaps.

GRAMMR can also be applied to DNA fragmentation and reassembly-based DNAshuffling schemes. For instance, in methods where gene fragments aretaken through cycles of denaturation, annealing, and extension in thecourse of gene reassembly, GRAMMR can be employed as an intermediatestep.

In one such embodiment, the DNA from a gene, or pool of mutants' genesis fragmented by enzymatic, mechanical or chemical means, and optionallya size range of said fragments is isolated by a means such as separationon an agarose gel. The starting polynucleotide, such as a wild-type, ora desired variant, or a pool thereof, is added to the fragments and themixture is denatured and then allowed to anneal. The annealedpolynucleotides are treated with a polymerase to fill in the singlestranded gaps using the intact strand as a template. The resultingpartially complementary double strands will have non-base-pairednucleotides at the sites of the mismatches. Treatment with CEL I(Oleykowski et al., 1998; Yang et al., 2000), or an agent with similaractivity, such as RES I, will cause nicking of one or the otherpolynucleotide strand 3′ of each mismatch. Addition of a polymerasecontaining a 3′-to-5′ exonuclease that provides proofreading activity,such as, DNA Pol I, T4 DNA Pol I, will allow excision of the mismatch,and subsequent 5′-to-3′ polymerase activity will fill in the gap usingthe other strand as a template. A DNA ligase, such as, T4 DNA Ligase,can then seal the nick by restoring the phosphate backbone of therepaired strand. The result is a randomization of sequence variationamong input strands to give output strands with potentially improvedproperties. These output polynucleotides can be cloned directly into asuitable vector, or they can be amplified by PCR before cloning. Theresulting clones are subjected to a selection or a screen forimprovements in a desired property.

In one such embodiment, the DNA from a pool of mutants' genes isfragmented by enzymatic, mechanical or chemical means, or fragments aregenerated by limited extension of random oligonucleotides annealed toparental templates (U.S. Pat. No. 5,965,408), and optionally a sizerange of said fragments is isolated by a means such as separation on anagarose gel. The mixture is denatured and then allowed to anneal. Theannealed polynucleotides are optionally treated with a polymerase tofill in the single stranded gaps. The resulting partially complementarydouble-strand fragments will have non-base paired nucleotides at thesites of the mismatches. Treatment with CEL I (Oleykowski et al., 1998;Yang et al., 2000), or an agent with similar activity, such as RES I,will cause nicking of one or the other polynucleotide strand 3′ of eachmismatch. The activity of a polymerase containing a 3′-to-5′ exonuclease(“proofreading”) activity, such as T4 DNA Polymerase, will allowexcision of the mismatch, and subsequent 5′-to-3′ polymerase activitywill fill in the gap using the other strand as a template. Optionally,DNA ligase, such as, T4 DNA Ligase, can then seal the nick by restoringthe phosphate backbone of the repaired strand. The result is arandomization of sequence variation among input strands to give outputstrands with potentially improved properties. Subsequent rounds ofdenaturing, annealing, and GRAMMR treatment allows gene reassembly. PCRcan be used to amplify the desired portion of the reassembled gene.These PCR output polynucleotides can be cloned into a suitable vector.The resulting clones are subjected to a selection or a screen for thedesired functional property.

Another embodiment of the present invention provides starting with acontinuous scaffold strand to which fragments of another gene or genesanneal. The flaps and gaps are trimmed and filled as is described inCoco, et al., Nature Biotech 19 (01)354; U.S. Pat. No. 6,319,713, andGRAMMR is performed. In this process, GRAMMR would bring about furthersequence reassortment by permitting transfer of sequence informationbetween the template strand and the strand resulting from flap and gaptrimming and ligation. This method provides the benefits ofincorporating specific sequence patches into one continuous strandfollowed by GRAMMR of residues that mismatch with the scaffold. Byannealing many fragments simultaneously to the same sequence or gene,many individual sites can be addressed simultaneously, thereby allowingreassortment of multiple sequences or genes at once. Unlike the methoddisclosed by Coco, et al., in the present embodiment, the scaffold isnot degraded, rather the duplex can be directly cloned, or amplified byPCR prior to cloning. Exhaustive mismatch resolution will result in aperfectly duplexed DNA. Partial mismatch resolution will result inessentially two different reasserted products per duplex.

As can be appreciated from the present disclosure, GRAMMR can also beapplied to a variety of methods that include the annealing of relatedDNAs as a step in their process. For example, many site-directedmutagenesis protocols call for the annealing of mutant-encoding DNAmolecules to a circular DNA in single-stranded form, either phagemid ordenatured plasmid. These DNAs are then extended with a polymerase,followed by treatment with ligase to seal the nick, with furthermanipulation to remove the parental sequence, leaving the desiredmutation or mutations incorporated into the parental genetic background.Though these protocols are generally used to incorporate specificmutations into a particular DNA sequence, it is feasible that the GRAMMRprocess can be applied to the heteroduplexed molecules generated in sucha process to reassort sequence variations between the two strands,thereby resulting in a diverse set of progeny with reasserted geneticvariation.

Another embodiment provides for a sequential round of reassortment on aparticular region. For example, DNA fragments are annealed to a circularsingle-strand phagemid DNA, and GRAMMR is performed. The fragments canbe treated in order to prevent them from being physically incorporatedinto the output material. For example, they can be terminated at the 3′end with di-deoxy residues making them non-extendible. Multiple roundsof reassortment can be performed, but only modified molecules from theoriginal input single stranded DNA clone will be recovered. Theconsequence will be that the DNA fragments used in this reassortmentwill contribute only sequence information to the final product and willnot be physically integrated into the final recoverable product.

In instances where it is desired to resolve only sites of significantmismatch, that is patches of more than about 1 to 3 mismatches, S1nuclease can be used. S1 nuclease is an endonuclease specific forsingle-stranded nucleic acids. It can recognize and cleave limitedregions of mismatched base pairs in DNA:DNA or DNA:RNA duplexes. Amismatch of at least about 4 consecutive base pairs is generallyrequired for recognition and cleavage by S1 nuclease. Mismatchresolution will not occur if both strands are cleaved, so the DNA mustbe repaired after the first nick and before the counter-nick. Othernucleases may be preferable for specifically tuning cleavage specificityaccording to sequence, sequence context, or size of mismatch.

In addition, other means of addressing mismatched residues, such aschemical cleavage of mismatches may be used. Alternatively, one canchoose to subject the strands of heteroduplexed DNA to random nickingwith an activity such as that exhibited by DNaseI or an agent thatcleaves only in duplexed regions. If nick formation occurs in a regionof identity between the two genes, the DNA ligase present in thereaction will seal the nick with no net transfer of sequenceinformation. However, if nick formation occurs near a site of mismatch,the mismatched bases can be removed by 3′-5′ exonuclease and the gapfilled in by polymerase followed by nick sealing by ligase.Alternatively, application of nick-translation through regions ofheterogeneity can bring about sequence reassortment. These processes,though not directed exclusively by the mismatch status of the DNA, willserve to transfer sequence information to the repaired strand, and thusresult in a reasserted sequence.

GRAMMR can be used for protein, peptide, or aptamer display methods toobtain recombination between library members that have been selected. Asfragmentation of the input DNAs is not required for GRAMMR, it may bepossible to reassort sequence information between very small stretchesof sequence. For instance, DNAs encoding small peptides or RNA aptamersthat have been selected for a particular property such as target bindingcan be reasserted. For annealing to occur between the selected DNAmolecules, some level of sequence homology should be shared between themolecules, such as at the 5′ and 3′ regions of the coding sequence, inregions of the randomized sequence segment that bear similarity becauseof similar binding activities, or through the biasing of codonwobble-base identity to a particular set of defaults.

Manipulation of the reaction temperature at which GRAMMR is conductedcan be useful. For example, lower temperatures will help to stabilizeheteroduplexes allowing GRAMMR to be performed on more highly mismatchedsubstrates. Likewise, additives that affect base-pairing betweenstrands, such as salts, PEG, formamide, etc, can be used to alter thestability of the heteroduplex in the GRAMMR, thereby affecting theoutcome of the reaction.

In another embodiment, the mismatched double stranded polynucleotidesare generated, treated with a DNA glycosylase to form an apurinic orapyrimidinic site, (that is an “Asite”) an AP endonuclease activity tocleave the phosphodiester bond, deoxyribulose phosphodiesterase toremove the deoxyribose-phosphate molecules, DNA polymerase 8 or otherDNA polymerase to add a single nucleotide to the 3′ end of the DNAstrand at the gap, and DNA ligase to seal the gap. The result is areassortment of sequence variations between input strands to give outputstrands with potentially improved properties. These outputpolynucleotides can be cloned directly into a suitable vector, or theycan be amplified by PCR before cloning. The resulting clones aresubjected to a selection or a screen for improvements in a desiredproperty.

Another embodiment provides for zonal mutagenesis by GRAMMR, that is,random or semi-random mutations at, and in the immediate vicinity of,mismatched residues using nucleotide analogues that have multiplebase-pairing potential. This provides for concentration of essentiallyrandom mutagenesis at a particular point of interest, and adds anotherbenefit to the present invention. Similar genes with slightly differentfunctions, for example, plant R-genes, enzymes, or the like, willexhibit moderate sequence differences between them in regions that willbe important for their own particular activities. Genes that expressthese activities, such as different substrates, binding partners,regulatory sites, or the like, should have heterogeneity in the regionsthat govern these functions. Since it is known that the specificity ofsuch functions is associated with these amino acids and their neighbors,GRAMMR mutagenesis might serve to both reassort sequence variation amonggenes and also direct random mutagenesis to these regions to drive themfurther and faster evolutionarily, while not disturbing other sequences,such as structural framework, invariant residues, and other suchimportant sites, that are potentially less tolerant to randomization.

Different enzymes with distinct functions will not differ just in theoperative regions, such as active sites, regulatory sites, and the like.They are likely to have other differences from one another that arisethrough genetic drift. Further randomization in the locales of suchchanges might therefore be considered neutral, minimally important, ordeleterious to the outcome of a mutagenesis experiment. In order todirect the random mutagenesis away from such inconsequential sites, andtoward sites that might present a better result for random mutagenesis,such as the active site of an enzyme, the codon usage bias of the genescould be manipulated to decrease or increase the overall level ofnucleotide complementarity in those regions. If regions of greatercomplementarity are less susceptible to GRAMMR than regions of lessercomplementarity, then the degree of GRAMMER-directed zonal randommutagenesis at a given site can be modulated.

In another embodiment, after heteroduplex molecules are formed, anenzyme with a 3′ to 5′ exonuclease activity is added such that onestrand of each end of the heteroduplex is digested back. At a point atwhich, on average, a desired amount of 3′ to 5′ digestion has occurred,dNTPs are added to allow the 5′ to 3′ polymerase activity from the sameor an additional enzyme to restore the duplex using the opposite strandas a template. Thus mismatches in the digested regions are resolved tocomplementarity. Optionally, the resultant duplexes are purified,denatured and then allowed to anneal. The process of digestion, thenpolymerization is repeated resulting in new chimeric sequences.Additional cycles of the process can be performed as desired. Outputduplex molecules are cloned and tested for the desired functionalproperty. This process requires no fragmentation and reassembly. Inaddition, this process requires no endonucleolytic cleavages.

In another embodiment, after the heteroduplex molecules are formed, anenzyme with a 5′ to 3′ exonuclease activity, such as, T7 Gene6Exonuclease as disclosed in Enger, M J and Richardson, C C, J Biol Chem258(83)11197), is added such that one strand of each end of theheteroduplex is digested. At a point at which, on average, a desiredamount of 5′ to 3′ digestion has occurred, the reaction is stopped andthe exonuclease inactivated. Oligonucleotide primers complementary tothe 5′ and 3′ ends of the target polynucleotides are added and annealed.A DNA polymerase, such as, T4 DNA Polymerase, a DNA ligase and dNTPs areadded to allow the 5′ to 3′ polymerase activity to extend the primersand restore the duplex using the opposite strand as a template, withligase sealing the nick. Thus mismatches in the digested regions areresolved to complementarity. Optionally, the resultant duplexes arepurified, denatured and then allowed to anneal. The process of digestionthen polymerization is repeated resulting in new chimeric sequences.Additional cycles of the process can be performed as desired. Outputduplex molecules are cloned and tested for the desired functionalproperty. This process requires no fragmentation and reassembly. Inaddition, this process requires no endonucleolytic cleavages.

In the current invention the random reassortment occurs in an in vitroDNA mismatch-resolution reaction. This method does not require any stepsof “gene reassembly” that serve as the foundation for the earliermutation reassortment (“shuffling”) methods. Instead, it is based uponthe ability of a reconstituted or artificial DNA mismatch resolvingsystem to transmit sequence variations from one or more strands of DNAinto another DNA strand by hybridization and mismatch resolution invitro.

In general, standard techniques of recombinant DNA technology aredescribed in various publications, e.g., (Ausubel, 1987; Ausubel, 1999;Sambrook et al., 1989), each of which is incorporated herein in theirentirety by reference. Polynucleotide modifying enzymes were usedaccording to the manufacturers recommendations. If desired, PCRamplimers for amplifying a predetermined DNA sequence may be chosen atthe discretion of the practitioner.

It is noted that each of the activities taught in the present inventionthat are involved in the GRAMMR reaction can be interchanged with afunctional equivalent agent with similar activity, and that such changesare within the scope of the present invention. For instance, as wasindicated in Example 2, Taq DNA ligase could substitute for T4 DNAligase. Other ligases can be substituted as well, such as E. coli DNAligase. Likewise, as shown in Examples 2 and 8, respectively, Pfupolymerase and T7 DNA polymerase can be substituted for T4 DNApolymerase. Other enzymes with appropriate exonuclease activity with orwithout associated polymerase can function in place of any of theseenzymes for the exonuclease activity needed for the GRAMMR reaction. Ina similar way, any polymerase with functionally equivalent activity tothose demonstrated to work for GRAMMR can be used for substitution.These include E. coli Pol 1, the Klenow fragment of E. coli Pol 1,polymerase beta, among many others.

Strand cleavage may be brought about in a number of ways. In addition toCEL I, a number of functionally equivalent, and potentially homologousactivities found in extracts from a variety of plant species(Oleykowski, Nucleic Acids Res 1998; 26:4597-602) may be used. Othermismatch-directed endonucleases such as T4 endonuclease VII, T7endonuclease I, and SP nuclease (Oleykowski, Biochemistry 1999; 38:2200-5) may be used. Another particularly useful mismatch-directedendonuclease is RES I. Other nucleases which attack single stranded DNAcan be used, such as S1 nuclease, FEN1, cleavase, mung bean nuclease,and nuclease P1. Enzymes that make random cleavage events in DNA, suchas pancreatic DNase I may also be substituted for the strand cleavingactivity in GRAMMR. A number of methods for bringing about strandcleavage through other means are also envisioned. These includepotassium permanganate used with tetraethylammonium acetate, the use ofsterically bulky photoactivatable DNA intercalators such as[Rh(bpy)2(chrysi)]3+, osmium tetroxide with piperidine alkaloid, andhydroxylamine with piperidine alkaloid, as well as the use of radiationenergy to bring about strand breakage.

Another embodiment contemplates an isolated protein having mismatchendonuclease activity comprising an amino acid sequence that is at least60% identical to SEQ ID NO:17 as determined by BLAST analysis.

Another embodiment contemplates an isolated protein having mismatchendonuclease activity comprising an amino acid sequence that is at least65% identical to SEQ ID NO:17 as determined by BLAST analysis.

Another embodiment contemplates an isolated protein having mismatchendonuclease activity comprising an amino acid sequence that is at least70% identical to SEQ ID NO:17 as determined by BLAST analysis.

Another embodiment contemplates an isolated protein having mismatchendonuclease activity comprising an amino acid sequence that is at least80% identical to SEQ ID NO:17 as determined by BLAST analysis.

Another embodiment contemplates an isolated protein having mismatchendonuclease activity comprising an amino acid sequence that is at least90% identical to SEQ ID NO:17 as determined by BLAST analysis.

Another embodiment contemplates an isolated protein having mismatchendonuclease activity comprising an amino acid sequence that is at least95% identical to SEQ ID NO:17 as determined by BLAST analysis.

Another embodiment to the present invention is directed to recombinantplant viral nucleic acids and recombinant viruses which are stable formaintenance and transcription or expression of non-native (foreign)nucleic acid sequences and which are capable of systemicallytranscribing or expressing such foreign sequences in the host plant.More specifically, recombinant plant viral nucleic acids according tothe present invention comprise a native plant viral subgenomic promoter,at least one non-native plant viral subgenomic promoter, a plant viralcoat protein coding sequence, and optionally, at least one non-native,nucleic acid sequence.

The present invention provides nucleic acid molecules useful as vectorsor plasmids for the expression of CEL I endonuclease.

The nucleic acid molecules are CEL I open reading frames containedwithin vector plasmids. The nucleic acid molecules were deposited withthe American Type Culture Collection, Manassas, Va. 20110-2209 USA. Thedeposits were received and accepted on Dec. 13, 2001, and assigned thefollowing Patent Deposit Designation numbers, PTA-3926, and PTA-3927.The preparation and use of the nucleic acid molecules are further taughtin Example 12 herein.

The present invention also provides nucleic acid molecules comprisingthe nucleic acid sequence of SEQ ID NO:16, useful as vectors or plasmidsfor the expression of RES I endonuclease or comprising nucleic acidsequences useful for the expression of proteins having mismatchendonuclease activity comprising an amino acid sequence that is at least60% identical to SEQ ID NO:17 as determined by BLAST analysis.

The nucleic acid molecule of SEQ ID NO:16 was deposited with theAmerican Type Culture Collection, Manassas, Va. 20110-2209 USA. Thedeposit was received and accepted on Jul. 30, 2002 and assigned thefollowing Patent Deposit Designation number, PTA-4562. The preparationand use of the nucleic acid molecule of SEQ ID NO:16 is further taughtin Example 13 herein.

The present invention further provides a plant cell comprising a vectoror plasmid comprising a nucleic acid sequence of SEQ ID NO:16 where theplant cell is a host cell, or production cell.

The present invention also provides a recombinant plant viral nucleicacid comprising of at least one sub-genomic promoter capable oftranscribing or expressing CEL I, RES I endonuclease, or a mismatchendonuclease comprising an amino acid sequence that is at least 60%identical to SEQ ID NO:17 as determined by BLAST analysis in a plantcell, wherein the plant cell is a host cell, or production cell.

The present invention also provides a process for expressing RES Iendonuclease using a recombinant plant viral nucleic acid comprising anucleic acid sequence of SEQ ID NO:16.

In another embodiment, a plant viral nucleic acid is provided in whichthe native coat protein coding sequence has been deleted from a viralnucleic acid, a non-native plant viral coat protein coding sequence anda non-native promoter, preferably the subgenomic promoter of thenon-native coat protein coding sequence, capable of expression in theplant host, packaging of the recombinant plant viral nucleic acid, andensuring a systemic infection of the host by the recombinant plant viralnucleic acid, has been inserted. Alternatively, the coat protein genemay be inactivated by insertion of the non-native nucleic acid sequencewithin it, such that a fusion protein is produced. The recombinant plantviral nucleic acid may contain one or more additional non-nativesubgenomic promoters. Each non-native subgenomic promoter is capable oftranscribing or expressing adjacent genes or nucleic acid sequences inthe plant host and incapable of recombination with each other and withnative subgenomic promoters. Non-native (foreign) nucleic acid sequencesmay be inserted adjacent the native plant viral subgenomic promoter orthe native and a non-native plant viral subgenomic promoters if morethan one nucleic acid sequence is included. The non-native nucleic acidsequences are transcribed or expressed in the host plant under controlof the subgenomic promoter to produce the desired products.

In another embodiment, a recombinant plant viral nucleic acid isprovided as in the first embodiment except that the native coat proteincoding sequence is placed adjacent one of the non-native coat proteinsubgenomic promoters instead of a non-native coat protein codingsequence.

In yet another embodiment, a recombinant plant viral nucleic acid isprovided in which the native coat protein gene is adjacent itssubgenomic promoter and one or more non-native subgenomic promoters havebeen inserted into the viral nucleic acid. The inserted non-nativesubgenomic promoters are capable of transcribing or expressing adjacentgenes in a plant host and are incapable of recombination with each otherand with native subgenomic promoters. Non-native nucleic acid sequencesmay be inserted adjacent the non-native subgenomic plant viral promoterssuch that said sequences are transcribed or expressed in the host plantunder control of the subgenomic promoters to produce the desiredproduct.

In another embodiment, a recombinant plant viral nucleic acid isprovided as in the third embodiment except that the native coat proteincoding sequence is replaced by a non-native coat protein codingsequence.

The viral vectors are encapsidated by the coat proteins encoded by therecombinant plant viral nucleic acid to produce a recombinant plantvirus. The recombinant plant viral nucleic acid or recombinant plantvirus is used to infect appropriate host plants. The recombinant plantviral nucleic acid is capable of replication in the host, systemicspread in the host, and transcription or expression of foreign gene(s)in the host to produce the desired product.

As used herein, the term “host” refers to a cell, tissue or organismcapable of replicating a vector or plant viral nucleic acid and which iscapable of being infected by a virus containing the viral vector orplant viral nucleic acid. This term is intended to include prokaryoticand eukaryotic cells, organs, tissues or organisms, where appropriate.

As used herein, the term “infection” refers to the ability of a virus totransfer its nucleic acid to a host or introduce viral nucleic acid intoa host, wherein the viral nucleic acid is replicated, viral proteins aresynthesized, and new viral particles assembled. In this context, theterms “transmissible” and “infective” are used interchangeably herein.

As used herein, the term “non-native” refers to any RNA sequence thatpromotes production of subgenomic mRNA including, but not limited to, 1)plant viral promoters such as ORSV and brome mosaic virus, 2) viralpromoters from other organisms such as human sindbis viral promoter, and3) synthetic promoters.

As used herein, the term “phenotypic trait” refers to an observableproperty resulting from the expression of a gene.

As used herein, the term “plant cell” refers to the structural andphysiological unit of plants, consisting of a protoplast and the cellwall.

As used herein, the term “plant Organ” refers to a distinct and visiblydifferentiated part of a plant, such as root, stem, leaf or embryo.

As used herein, the term “plant tissue” refers to any tissue of a plantin planta or in culture. This term is intended to include a whole plant,plant cell, plant organ, protoplast, cell culture, or any group of plantcells organized into a structural and functional unit.

As used herein, the term “production cell” refers to a cell, tissue ororganism capable of replicating a vector or a viral vector, but which isnot necessarily a host to the virus. This term is intended to includeprokaryotic and eukaryotic cells, organs, tissues or organisms, such asbacteria, yeast, fungus or plant tissue.

As used herein, the term “promoter” refers to the 5′-flanking,non-coding sequence adjacent a coding sequence which is involved in theinitiation of transcription of the coding sequence.

As used herein, the term “protoplast” refers to an isolated plant cellwithout cell walls, having the potency for regeneration into cellculture or a whole plant.

As used herein, the term “recombinant plant viral nucleic acid” refersto plant viral nucleic acid which has been modified to containnon-native nucleic acid sequences.

As used herein, the term “recombinant plant virus” refers to a plantvirus containing the recombinant plant viral nucleic acid.

As used herein, the term “subgenomic promoter” refers to a promoter of asubgenomic mRNA of a viral nucleic acid.

As used herein, the term “substantial sequence homology” refers tonucleotide sequences that are substantially functionally equivalent toone another. Nucleotide differences between such sequences havingsubstantial sequence homology will be de minimus in affecting functionof the gene products or an RNA coded for by such sequence.

As used herein, the term “transcription” refers to production of an RNAmolecule by RNA polymerase as a complementary copy of a DNA sequence.

As used herein, the term “vector” refers to a self-replicating DNAmolecule which transfers a DNA segment between cells.

As used herein, the term “virus” refers to an infectious agent composedof a nucleic acid encapsidated in a protein. A virus may be a mono-,di-, tri- or multi-partite virus, as described above.

The present invention provides for the infection of a plant host by arecombinant plant virus containing recombinant plant viral nucleic acidor by the recombinant plant viral nucleic acid which contains one ormore non-native nucleic acid sequences which are transcribed orexpressed in the infected tissues of the plant host. The product of thecoding sequences may be recovered from the plant or cause a phenotypictrait in the plant.

The present invention has a number of advantages, one of which is thatthe transformation and regeneration of target organisms is unnecessary.Another advantage is that it is unnecessary to develop vectors thatintegrate a desired coding sequence in the genome of the targetorganism. Existing organisms can be altered with a new coding sequencewithout the need of going through a germ cell. The present inventionalso gives the option of applying the coding sequence to the desiredorganism, tissue, organ or cell. Recombinant plant viral nucleic acid isalso stable for the foreign coding sequences, and the recombinant plantvirus or recombinant plant viral nucleic acid is capable of systemicinfection in the plant host.

An important feature of the present invention is the preparation ofrecombinant plant viral nucleic acids (RPVNA) which are capable ofreplication and systemic spread in a compatible plant host, and whichcontain one or more non-native subgenomic promoters which are capable oftranscribing or expressing adjacent nucleic acid sequences in the planthost. The RPVNA may be further modified to delete all or part of thenative coat protein coding sequence and to contain a non-native coatprotein coding sequence under control of the native or one of thenon-native subgenomic promoters, or put the native coat protein codingsequence under the control of a non-native plant viral subgenomicpromoter. The RPVNA have substantial sequence homology to plant viralnucleotide sequences. A partial listing of suitable viruses aredescribed herein. The nucleotide sequence may be an RNA, DNA, cDNA orchemically synthesized RNA or DNA.

The first step in achieving any of the features of the invention is tomodify the nucleotide sequences of the plant viral nucleotide sequenceby known conventional techniques such that one or more non-nativesubgenomic promoters are inserted into the plant viral nucleic acidwithout destroying the biological function of the plant viral nucleicacid. The subgenomic promoters are capable of transcribing or expressingadjacent nucleic acid sequences in a plant host infected by therecombinant plant viral nucleic acid or recombinant plant virus. Thenative coat protein coding sequence may be deleted in two embodiments,placed under the control of a non-native subgenomic promoter in a secondembodiment, or retained in a further embodiment. If it is deleted orotherwise inactivated, a non-native coat protein gene is inserted undercontrol of one of the non-native subgenomic promoters, or optionallyunder control of the native coat protein gene subgenomic promoter. Thenon-native coat protein is capable of encapsidating the recombinantplant viral nucleic acid to produce a recombinant plant virus. Thus, therecombinant plant viral nucleic acid contains a coat protein codingsequence, which may be native or a normative coat protein codingsequence, under control of one of the native or non-native subgenomicpromoters. The coat protein is involved in the systemic infection of theplant host.

Some of the viruses which meet this requirement, and are thereforesuitable, include viruses from the tobacco mosaic virus group such asTobacco Mosaic virus (TMV), Cowpea Mosaic virus (CMV), Alfalfa Mosaicvirus (AMV), Cucumber Green Mottle Mosaic virus watermelon strain(CGMMV-W) and Oat Mosaic virus (OMV) and viruses from the brome mosaicvirus group such as Brome Mosaic virus (MBV), broad bean mottle virusand cowpea chlorotic mottle virus. Additional suitable viruses includeRice Necrosis virus (RNV), and geminiviruses such as tomato goldenmosaic virus (TGMV), Cassaya latent virus (CLV) and maize streak virus(MSV).

CEL I is a mismatch endonuclease isolated from celery. The use of CEL Iin a diagnostic method for the detection of mutations in targetedpolynucleotide sequences, in particular, those associated with cancer,is disclosed in U.S. Pat. No. 5,869,245. Methods of isolating andpreparing CEL I are also disclosed in this patent. However, there is nodisclosure in this patent relating to the use of CEL I in DNA sequencereassortment.

Nucleic acid molecules that encode CEL I are disclosed in PCTApplication Publication No. WO 01/62974 A1. As with U.S. Pat. No.5,869,245, the use of CEL I in a diagnostic method for the detection ofmutations in targeted polynucleotide sequences associated with cancer isdisclosed. Also similarly, there is no disclosure relating to the use ofCEL I in DNA reassortment.

The use of RES I endonuclease is contemplated in diagnostic methods forthe detection of mutations in targeted polynucleotide sequences, inparticular, those associated with cancer. Examples of some of thesetypes of diagnostic methods are disclosed in U.S. Pat. No. 5,869,245,Sokurenko, et al., and Del Tito, et al.

The reactivity of Endonuclease VII of phage T4 with DNA-loops of eight,four, or one nucleotide, or any of 8 possible base mismatches in vitrois disclosed in “Endonuclease VII of Phage T4 Triggers MismatchCorrection in Vitro” Solaro, et al., J Mol Biol 230(93)868. Thepublication reports a mechanism where Endonuclease VII introduces doublestranded breaks by creating nicks and counternicks within sixnucleotides 3′ of the mispairing. The publication discloses that a timedelay between the occurrence of the first nick and the counternick wassufficient to allow the 3′-5′ exonuclease activity of gp43 to remove themispairing and its polymerase activity to fill in the gap before theoccurrence of the counternick. Nucleotides are erased from the firstnick, which is located 3′ of the mismatch on either strand and stops 5′of the mismatch at the first stable base-pair. The polymerase activityproceeds in the 5′ to 3′ direction towards the initial nick, which issealed by DNA ligase. As a result, very short repair tracks of 3 to 4nucleotides extend across the site of the former mismatch. Thepublication concludes with a discussion regarding the various activitiesEndonuclease VII may have within phage T4. However, the publication doesnot disclose any practical utility for Endonuclease VII outside of phageT4, and there is no disclosure regarding its applicability in DNAreassortment.

A method for creating libraries of chimeric DNA sequences in vivo inEscherichia coli is disclosed in Nucleic Acids Research, 1999, Vol 27,No. 18, e18, Volkov, A. A., Shao, Z., and Arnold, F. H. The method usesa heteroduplex formed in vitro to transform E. coli where repair ofregions of non-identity in the heteroduplex creates a library of new,recombined sequences composed of elements of each parent. Although thepublication discloses the use of this method as a convenient addition toexisting DNA recombination methods, that is, DNA shuffling, thedisclosed method is limited to the in vivo environment of E. coli. Thepublication states that there is more than one mechanism available formismatch repair in E. coli, and that the ‘long patch’ repair mechanism,which utilizes the MutS/L/H enzyme system, was probably responsible forthe heteroduplex repair.

CITED REFERENCES

-   1. Arkin, A. P. and Youvan, D. C. (1992) An algorithm for protein    engineering: simulations of recursive ensemble mutagenesis. Proc    Natl Acad Sci USA, 89, 7811-7815.-   2. Ausubel, F. M. (1987) Current protocols in molecular biology.    Published by Greene Pub. Associates and Wiley-Interscience: J.    Wiley, New York.-   3. Ausubel, F. M. (1999) Short protocols in molecular biology: a    compendium of methods from Current protocols in molecular biology.    Wiley, New York.-   4. Barnes, W. M. (1994) PCR amplification of up to 35-kb DNA with    high fidelity and high yield from lambda bacteriophage templates.    Proc Natl Acad Sci USA, 91, 2216-2220.-   5. Bartel, D. P. and Szostak, J. W. (1993) Isolation of new    ribozymes from a large pool of random sequences. Science, 261,    1411-1418.-   6. Cadwell, R. C. and Joyce, G. F. (1992) Randomization of genes by    PCR mutagenesis. PCR Methods Appl, 2, 28-33.-   7. Calogero, S., Bianchi, M. E. and Galizzi, A. (1992) In vivo    recombination and the production of hybrid genes. FEMS Microbiol    Lett, 76, 41-44.-   8. Caren, R., Morkeberg, R. and Khosla, C. (1994) Efficient sampling    of protein sequence space for multiple mutants. Biotechnology (NY),    12, 517-520.-   9. Delagrave, S., Goldman, E. R. and Youvan, D. C. (1993) Recursive    ensemble mutagenesis. Protein Eng, 6, 327-331.-   10. Delagrave, S, and Youvan, D. C. (1993) Searching sequence space    to engineer proteins: exponential ensemble mutagenesis.    Biotechnology (NY), 11, 1548-1552.-   11. Goldman, E. R. and Youvan, D. C. (1992) An algorithmically    optimized combinatorial library screened by digital imaging    spectroscopy. Biotechnology (NY), 10, 1557-1561.-   12. Gram, H., Marconi, L. A., Barbas, C. F.d., Collet, T. A.,    Lerner, R. A. and Kang, A. S. (1992) In vitro selection and affinity    maturation of antibodies from a naive combinatorial immunoglobulin    library. Proc Natl Acad Sci USA, 89, 3576-3580.-   13. Hayashi, N., Welschof, M., Zewe, M., Braunagel, M., Dubel, S.,    Breitling, F. and Little, M. (1994) Simultaneous mutagenesis of    antibody CDR regions by overlap extension and PCR. Biotechniques,    17, 310, 312, 314-315.-   14. Hermes, J. D., Blacklow, S. C. and Knowles, J. R. (1990)    Searching sequence space by definably random mutagenesis: improving    the catalytic potency of an enzyme. Proc Natl Acad Sci USA, 87,    696-700.-   15. Holland, J. H. (1992) Adaptation in natural and artificial    systems: an introductory analysis with applications to biology,    control, and artificial intelligence. MIT Press, Cambridge, Mass.-   16. Ji, G. and Silver, S. (1992) Regulation and expression of the    arsenic resistance operon from Staphylococcus aureus plasmid pI258.    J Bacteriol, 174, 3684-3694.-   17. Kauffman, S. A. (1993) The origins of order: self-organization    and selection in evolution. Oxford University Press, New York.-   18. Marton, A., Delbecchi, L. and Bourgaux, P. (1991) DNA nicking    favors PCR recombination. Nucleic Acids Res, 19, 2423-2426.-   19. Meyerhans, A., Vartanian, J. P. and Wain-Hobson, S. (1990) DNA    recombination during PCR. Nucleic Acids Res, 18, 1687-1691.-   20. Nissim, A., Hoogenboom, H. R., Tomlinson, I. M., Flynn, G.,    Midgley, C., Lane, D. and Winter, G. (1994) Antibody fragments from    a ‘single pot’ phage display library as immunochemical reagents.    EMBO J, 13, 692-698.-   21. Oleykowski, C. A., Bronson Mullins, C. R., Godwin, A. K. and    Yeung, A. T. (1998) Mutation detection using a novel plant    endonuclease. Nucleic Acids Res, 26, 4597-4602.-   22. Oliphant, A. R., Nussbaum, A. L. and Struhl, K. (1986) Cloning    of random-sequence oligodeoxynucleotides. Gene, 44, 177-183.-   23. Sambrook, J., Maniatis, T. and Fritsch, E. F. (1989) Molecular    cloning: a laboratory manual. Cold Spring Harbor Laboratory, Cold    Spring Harbor, N.Y.-   24. Stemmer, W. P. (1994a) DNA shuffling by random fragmentation and    reassembly: in vitro recombination for molecular evolution. Proc    Natl Acad Sci USA, 91, 10747-10751.-   25. Stemmer, W. P. (1994b) Rapid evolution of a protein in vitro by    DNA shuffling. Nature, 370, 389-391.-   26. Stemmer, W. P., Morris, S. K. and Wilson, B. S. (1993) Selection    of an active single chain Fv antibody from a protein linker library    prepared by enzymatic inverse PCR. Biotechniques, 14, 256-265.-   27. Winter, G., Griffiths, A. D., Hawkins, R. E. and    Hoogenboom, H. R. (1994) Making antibodies by phage display    technology. Annu Rev Immunol, 12, 433-455.-   28. Yang, B., Wen, X., Kodali, N. S., Oleykowski, C. A., Miller, C.    G., Kulinski, J., Besack, D., Yeung, J. A., Kowalski, D. and    Yeung, A. T. (2000) Purification, cloning, and characterization of    the CEL I nuclease. Biochemistry, 39, 3533-3541.-   29. Sokurenko, E. V., Tchesnokova, V., Yeung, A. T., Oleykowski, C.    A., Trintchina, E., Hughes, K. T., Rashid, R. A., Brint, J. M.,    Moseley, S. L., Lory, S. (2001) Detection of simple mutations and    polymorphisms in large genomic regions. Nucleic Acids Res, 29, e111.-   30. Yang, T. T., Sinai, P., Green, G., Kitts, P. A., Chen, Y. T.,    Lybarger, L., Chervenak, R., Patterson, G. H., Piston, D. W.,    Kain, S. R. (1998) Improved fluorescence and dual color detection    with enhanced blue and green variants of the green fluorescent    protein. J Biol Chem 273, 8212-8216-   31. Crameri, A., Whitehorn, E. A., Tate, E., Stemmer, W. P. (1996)    Improved green fluorescent protein by molecular evolution using DNA    shuffling. Nat Biotechnol 14, 315-319.-   32. Heim, R., Prasher, D. C., Tsien, R. Y. (1994) Wavelength    mutations and posttranslational autoxidation of green fluorescent    protein. Proc Natl Acad Sci USA 91, 12501-12504.-   33. Del Tito, B. J., Jr., Poff, H. E., 3^(rd), Novotny, M. A.,    Cartledge, D. M., Walker, R. I., 2^(nd), Earl, C. D.,    Bailey, A. L. (1998) Automated fluorescent analysis procedure for    enzymatic mutation detection. Clin Chem 44, 731-739.-   34. Perez-Amador, M, A., Abler, M. L., De Rocher, E. J.,    Thompson, D. M., van Hoof, A., LeBrasseur, N. D., Lers, A.,    Green, P. J. (2000) Identification of BFN1, a bifunctional nuclease    induced during leaf and stem senescence in Arabidopsis. Plant    Physiol 122 169-179.-   35. Buchanan-Wollaston, V. (1997) The molecular biology of leaf    senescence. J Exp Bot 48 181-199.

The following non-limiting examples are provided to illustrate thepresent invention.

EXAMPLE 1 Cleavage of Mismatched DNA Substrate by CEL I

This example teaches the preparation of CEL I enzyme and its use in thecleavage of mismatched DNA substrate.

CEL I enzyme was prepared from celery stalks using the homogenization,ammonium sulfate, and Concanavalin A-Sepharose protocol described byYang et al. (Biochemistry, 39:3533-3541 (2000), incorporated herein byreference. A 1.5 kg sample of chilled celery stalks was homogenized witha juice extractor. One liter of juice was collected, adjusted to 100 mMTris-HCL, pH 7.7 with 100 micromolar phenylmethylsulfonyl fluoride(PMSF), and filtered through two layers of Miracloth (Calbiochem). Solid(NH₄)₂SO₄ was slowly added to 25% saturation while stirring on ice.After 30 minutes, the suspension was centrifuged at 27,000 g for 1.5hours at 4° C. The supernatants were collected and adjusted with solid(NH₄)₂SO₄ to 80% saturation while stirring on ice followed bycentrifugation at 27,000 g for 2 hours. The pellets were re-suspended inbuffer B (0.1 M Tris-HCL, pH 7.7, 0.5 M KCl, 100 micromolar PMSF) anddialyzed against the same buffer.

Concanavalin A (ConA) Sepharose affinity chromatography was performed byfirst incubating the dialyzed sample with 2 ml of ConA resin overnightwith gentle agitation. The ConA resin was then packed into a 0.5 cmdiameter column and washed with several column volumes of buffer B.Elution was performed using 0.3 M alpha-methyl-mannoside in buffer B.Fractions were collected in 1 ml aliquots. Fractions were assayed formismatch cleavage activity on a radiolabeled mismatch substrate byincubating 0.1 microliter of each fraction with the mismatched probe inbuffer D (20 mM Tris-HCL, pH 7.4, 25 mM KCL, 10 mM MgCl₂) for 30 minutesat 45° C. as described by Oleykowski et al. (Nucleic Acids Research 26:4597-4602 (1998), incorporated herein by reference. Reaction productswere visualized by separation on 10% TBE-PAGE gels containing 7% urea(Invitrogen), followed by autoradiography. Aliquots of the CEL Ifractions having mismatch cleavage activity were stored frozen at −20°C. A series of five-fold dilutions of CEL I fraction #5 were thenanalyzed for mismatch cleavage of radiolabeled mismatch substrate.Reactions were performed either in buffer D, New England BioLabs (NEB)T4 DNA ligase buffer (50 mM Tris-HCL, pH 7.5, 10 mM MgCl₂, 10 mMdithiothreitol (DTT), 1 mM ATP, 25 microgram/ml BSA), or Gibco/BRL T4DNA ligase buffer (50 mM Tris-HCL, pH 7.6, 10 mM MgCl₂, 1 mM DTT, 1 mMATP, 5% (w/v) polyethylene glycol-8000). Reaction products werevisualized as above. Cleavage activity in buffer D and in NEB T4 DNAligase buffer were found to be roughly equivalent, whereas cleavage inthe PEG-containing Gibco/BRL ligase buffer was enhanced by five toten-fold compared to the other buffers.

Additional analysis of CEL I activity was carried out using definedheteroduplex DNAs from two different Green Fluorescent Protein (GFP)genes as substrate. This GFP heteroduplex substrate was prepared byannealing single stranded DNAs corresponding to cycle 3 GFP on the sensestrand and wild-type GFP on the antisense strand. The single-strandedDNAs had been synthesized by asymmetric PCR and isolated by agarose gelelectrophoresis. After annealing by heating to 90° C. and cooling in thepresence of 1×NEB restriction enzyme buffer 2 (10 mM Tris-HCL, pH 7.9,10 mM MgCl₂, 50 mM NaCl, 1 mM dithiothreitol), the heteroduplex DNA wasisolated by agarose gel electrophoresis followed by excision of theheteroduplex band and extraction using Qiaquick DNA spin columns. Atotal of twenty eight mismatches, one or two nucleotides in length,occur throughout the length of the heteroduplex molecule. Thedistribution of the mismatches ranges from small clusters of severalmismatches separated by one or two nucleotides to mismatches separatedby more than thirty base pairs on either side.

A series of three-fold dilutions of CEL I in 1×NEB T4 DNA ligase bufferwere prepared and one microliter aliquots of each were incubated in twoseparate series of 10 microliter reactions, each containing as substrateeither 0.5 microgram of a supercoiled plasmid preparation or one hundrednanograms of the cycle3/wild-type GFP heteroduplex. All reactions tookplace in 1×NEB T4 DNA ligase buffer. Reactions were incubated at 45° C.for 30 minutes and run on 1.5% TBE-agarose gel in the presence ofethidium bromide.

Treatment of the supercoiled plasmid preparation with increasing amountsof CEL I resulted in the conversion of supercoiled DNA to nickedcircular, then linear molecules, and then to smaller fragments of DNA ofrandom size. Treatment of the mismatched GFP substrate with the CEL Ipreparation resulted in the digestion of the full-length heteroduplexinto laddered DNA bands which are likely to represent cleavage onopposite DNA strands in the vicinity of clusters of mismatches. Furtherdigestion resulted in the conversion of the mismatched GFP substrate tosmaller DNAs that may represent a limit digest of the heteroduplex DNAby the CEL I preparation.

EXAMPLE 2 Conservation of Full Length GFP Gene with Mismatch ResolutionCocktails

This example teaches various mismatch resolution cocktails that conservethe full length GFP Gene.

Mismatched GFP substrate was treated with various concentrations of CELI in the presence of cocktails of enzymes that together constitute asynthetic mismatch resolution system. The enzymes used were CEL I, T4DNA polymerase, Taq DNA polymerase and T4 DNA ligase. CEL I activityshould nick the heteroduplex 3′ of mismatched bases. T4 DNA polymerasecontains 3′-5′ exonuclease for excision of the mismatched base from thenicked heteroduplex. T4 DNA polymerase and Taq DNA polymerase containDNA polymerase capable of filling the gap. T4 DNA ligase seals the nickin the repaired molecule. Taq DNA polymerase also has 5′ flap-aseactivity.

Matrix experiments were performed to identify the reaction conditionsthat would serve to resolve mismatches in the GFP heteroduplexsubstrate. In one experiment, cycle 3/wild-type GFP heteroduplex wasincubated in a matrix format with serial dilutions of CEL I fractionnumber five (described above) at eight different concentrations. Eachreaction contained 100 nanograms of heteroduplex substrate and 0.2microliters of T4 DNA ligase (Gibco BRL) in 1×NEBT4 DNA ligase bufferand dNTPs at 250 micromolar each, in a reaction volume of 10microliters. In all, the matrix contained 96 individual reactions. Onefull set of reactions was incubated at room temperature for 30 minuteswhile another full set was incubated at 37° C. for 30 minutes.

After incubation, PCR was used to amplify the GFP gene from eachreaction. Aliquots from each PCR were then digested with HindIII andHpaI and electrophoresed on 3% agarose gels with ethidium bromide. Onlycycle 3 GFP has a HindIII site and only wild-type encodes a HpaI site.

If DNA mismatch resolution occurred at either the HindIII or HpaImismatched sites, then a proportion of the PCR product would be expectedto contain both sites, yielding a novel band. The band was observed inall samples, including the negative control samples that had neither CELI, nor T4 DNA polymerase, nor Taq DNA polymerase. The results suggestedthat a basal level of background recombination may have occurred at somepoint in the experiment other than in the GRAMMR reaction; possibly inthe PCR step. PCR-mediated recombination is known to occur at somefrequency between related sequences during amplification [referencePaabo, et al., DNA damage promotes jumping between templates duringenzymatic amplification. J Biol Chem 265(90)4718-4721].

In another experiment, 200 nanograms of cycle 3/wild-type GFPheteroduplex was treated with CEL I and T4 DNA polymerase in variousconcentrations along with 2.5 units of Taq DNA polymerase in thepresence or absence of T4 DNA ligase (0.2 units; Gibco BRL). Eachreaction contained 1×NEB T4 DNA ligase buffer with 0.05 mM each dNTP ina final volume of 20 microliters. Reactions were incubated for 30minutes at 37° C. and 10 microliters were run on a 2% TBE-agarose gel inthe presence of ethidium bromide. Results showed that in the presence ofDNA ligase, but in the absence of T4 DNA polymerase, increasing amountsof CEL I caused greater degradation of the heteroduplexed DNA, but thatthis effect could be counteracted by increasing the amount of T4 DNApolymerase in the reaction. These results indicated that the variouscomponents of the complete reaction could act together to conserve theintegrity of the full-length gene through DNA mismatch resolution.

Another matrix experiment was conducted to expand on these results andto identify additional conditions for DNA mismatch resolution for thissynthetic system. 60 nanograms of cycle3/wild-type GFP heteroduplex weretreated with CEL I and T4 DNA polymerase at various concentrations inthe presence of 2.5 units of Taq DNA polymerase and 0.2 units of T4 DNAligase in 1×NEB T4 DNA ligase buffer containing 0.5 mM of each dNTP in areaction volume of 10 microliters. Each set of reactions was incubatedfor 1 hour at either 20° C., 30° C., 37° C., or at 45° C. All reactionswere then run on a 1.5% TBE-agarose gels in the presence of ethidiumbromide. The results showed that the GFP heteroduplex was cleaved intodiscrete fragments by the CEL I preparation alone. The success of DNAmismatch resolution was initially gauged by the degree to which theapparent full-length integrity of the GFP sequence was maintained by theother components of the mismatch resolution system in the presence ofCEL I. Conditions of enzyme concentration and temperature wereidentified that conserved a high proportion of the DNA as full-lengthmolecules in this assay. Namely, one microliter of the CEL I fractionfive preparation (described in Example 1) with one microliter (1 unit)of the T4 DNA polymerase in the presence of the other reactioncomponents which were held constant in the experiment. It was found thatas the reaction temperature increased, the degradative activity of CEL Iincreased accordingly. Furthermore, it was shown that the othercomponents of the repair reaction acted to conserve the integrity of thefull-length DNA at 20° C., 30° C., and 37° C., but was remarkably lessefficient at conserving the full-length DNA at 45° C. From theseresults, we concluded that under these experimental conditions,incubation at 45° C. was not optimal for the process of GRAMMR, and thatincubation at 20° C., 30° C., and 37° C. were permissible.

Another experiment was performed in which alternative enzymes were usedfor the DNA mismatch resolution reaction. Instead of T4 DNA ligase, TaqDNA ligase was used. Pfu DNA polymerase (Stratagene) was employed in aparallel comparison to a set of reactions that contained T4 DNApolymerase as the 3′ exonuclease/polymerase. Reactions were carried outin Taq DNA ligase buffer containing 8 units of Taq DNA ligase (NEB), 2.5units Taq DNA polymerase, 0.5 mM of each dNTP, various dilutions of CELI, and either T4 DNA polymerase or Pfu DNA polymerase). Reactions wererun on a 1.5% TBE-agarose gels in the presence of ethidium bromide. Itwas found that in the presence of the Pfu DNA polymerase, Taq DNApolymerase, and Taq DNA ligase, the full-length integrity of the CELI-treated substrate DNA was enhanced compared to DNA incubated with CELI alone. This result shows that enzymes with functionally equivalentactivities can be successfully substituted into the GRAMMR reaction.

EXAMPLE 3 Restoration of Restriction Sites to GFP Heteroduplex DNA afterDNA Mismatch Resolution (GRAMMR)

This experiment teaches the operability of genetic reassortment by DNAmismatch resolution (GRAMMR) by demonstrating the restoration ofrestriction sites.

The full-length products of a twenty-fold scale-up of the GRAMMRreaction, performed at 37° C. for one hour, using the optimal conditionsfound above (the 1× reaction contained sixty nanograms of heteroduplexDNA, one microliter of CEL I fraction five (described in Example 1), oneunit T4 DNA polymerase in the presence of 2.5 units of Taq DNApolymerase and 0.2 units of T4 DNA ligase in 1×NEB T4 DNA ligase buffercontaining 0.5 mM of each dNTP in a reaction volume of 10 microliters)were gel-isolated and subjected to restriction analysis by endonucleaseswhose recognition sites overlap with mismatches in the GFP heteroduplex,thereby rendering those sites in the DNA resistant to restriction enzymecleavage. The enzymes used were BamHI, HindIII, HpaI, and XhoI. Negativecontrols consisted of untreated GFP heteroduplex. Positive controlsconsisted of Cycle 3 or wild type GFP sequences, individually. Allcontrols were digested with the same enzymes as the product of the DNAmismatch resolution reaction. All samples were run on a 2% TBE-agarosegel in the presence of ethidium bromide.

After treatment with the mismatch resolution cocktail, a proportion ofthe DNA gained sensitivity to BamHI and XhoI restriction endonucleases,indicating that DNA mismatch resolution had occurred. The HpaI-cutsamples could not be interpreted since a low level of cleavage occurredin the negative control. The HindIII, BamHI and XhoI sites displayeddifferent degrees of cleavage in the GRAMMR-treated samples. Restorationof the XhoI site was more extensive than that of the BamHI site, whichwas in turn, more extensive than restoration at HindIII site.

The extent to which cleavage occurs is indicative of the extent to whichmismatches in the DNA have been resolved at that site. Differences inmismatch resolution efficiency may relate to the nature or density ofmismatches present at those sites. For example, the XhoI site spans athree-mismatch cluster, whereas the BamHI site spans two mismatches andthe HindIII site spans a single mismatch.

EXAMPLE 4 GRAMMR-Reassorted GFP Genes

This example demonstrates that GRAMMR can reassert sequence variationbetween two gene sequences in a heteroduplex and that there are nosignificant differences in GRAMMR products that were directly cloned, orPCR amplified prior to cloning.

The GRAMMR-treated DNA molecules of Example 3 were subsequently eitherdirectly cloned by ligation into pCR-Blunt II-TOPO (Invitrogen), oramplified by PCR and ligated into pCR-Blunt II-TOPO according to themanufacturer's instructions, followed by transformation into E. coli.After picking individual colonies and growing in liquid culture, DNA wasprepared and the sequences of the GFP inserts were determined. Asnegative controls, the untreated GFP heteroduplex substrate was eitherdirectly cloned or PCR amplified prior to cloning into the plasmid.

In GRAMMR, reassortment of sequence information results from a processof information transfer from one strand to the other. These sites ofinformation transfer are analogous to crossover events that occur inrecombination-based DNA shuffling methods. For the purposes of relatingthe results of these reassortment experiments, however, the GRAMMRoutput sequences are described in terms of crossovers. Sequences oftwenty full-length GFP clones that were derived from the GRAMMR-treatedGFP genes were analyzed. Four of these clones were derived from DNA thathad been directly cloned into pZeroBlunt (Invitrogen) following GRAMMRtreatment (no PCR amplification). The other sixteen sequences werecloned after PCR amplification. Analysis of these full-length GFPsequences revealed that all twenty sequences had undergone sequencereassortment having between one and ten crossovers per gene. A total of99 crossovers were found in this set of genes, giving an average ofabout 5 crossovers per gene. With the distance between the first andlast mismatches of about 590 nucleotides, an overall frequency ofroughly one crossover per 120 base-pairs was calculated. Within this setof twenty clones, a total of seven point mutations had occurred withinthe sequences situated between the PCR primer sequences, yielding amutation frequency of roughly 0.05%.

Thirty-five clones that had not been subjected to GRAMMR treatment weresequenced. Of these controls, fourteen were derived from direct cloningand twenty-one were obtained after PCR amplification using the GFPheteroduplex as template. Of these thirty-five non-GRAMMR treatedcontrol clones, eight were recombinants, ranging from one to threecrossovers, with most being single crossover events. A total oftwenty-five point mutations had occurred within the sequences situatedbetween the PCR primers, yielding a mutation frequency of roughly 0.1%.

No significant differences were observed between the GRAMMR-treatedproducts that were either directly cloned or PCR amplified. Notably,though, in the non-GRAMMR-treated controls, the frequency ofrecombinants was higher in the PCR amplified DNAs than in the directlycloned DNAs. This higher frequency is consistent with results obtainedby others in which a certain level of recombination was found to becaused by “jumping PCR.” [Paabo, et al., DNA damage promotes jumpingbetween templates during enzymatic amplification. J Biol Chem265(90)4718-4721].

EXAMPLE 5 Heteroduplex Substrate Preparation for Plasmid-on-PlasmidGenetic Reassortment by DNA Mismatch Resolution (POP GRAMMR) of GFPPlasmids

This example teaches that heteroduplex substrate for GeneticReassortment by DNA Mismatch Resolution can be in the form of intactcircular plasmids. Cycle 3-GFP and wild-type GFP heteroduplex moleculeswere prepared plasmid-on-plasmid (POP) format. In this format, the GFPsequences were reasserted within the context of a circulardouble-stranded plasmid vector backbone. This made possible the recoveryof the reasserted product by direct transformation of E. coli using analiquot of the GRAMMR reaction. Consequently, neither PCR amplificationnor other additional manipulation of the GRAMMR-treated DNA wasnecessary to obtain reasserted clones.

Mismatched DNA substrate for POP-GRAMMR reactions was generatedcontaining wild-type GFP (SEQ ID NO:01) and Cycle 3 GFP (SEQ ID NO:02),resulting in the two pBluescript-based plasmids, pBSWTGFP (SEQ ID NO:03)and pBSC3GFP (SEQ ID NO:04), respectively. The GFPs were insertedbetween the KpnI and EcoRI sites of the pBluescript polylinker so thatthe only sequence differences between the two plasmids occurred at siteswhere the wild-type and Cycle 3 GFPs differ from one-another. Bothplasmids were linearized by digestion of the plasmid backbone with SapI,cleaned up using a DNA spin-column, mixed, amended to 1×PCR buffer(Barnes, 1994; PNAS, 91, 2216-2220), heated in a boiling water bath forthree minutes, and slow-cooled to room temperature to anneal thedenatured DNA strands. Denaturing and annealing these DNAs led to amixture of duplexes, the re-formation of parental duplexes, and theformation of heteroduplexes from the annealing of strands from each ofthe two input plasmids. Parental duplexes were deemed undesirable forGRAMMR and were removed by digestion with restriction enzymes that cutin one or the other parental duplex but not in the heteroduplexedmolecules. PmlI and XhoI were chosen for this operation since PmlI cutsonly in the wild-type GFP sequence and XhoI cuts only Cycle 3 GFP. Aftertreatment with these enzymes, the products were resolved on an agarosegel. The full-length, uncut heteroduplex molecules were resolved fromthe PmlI- and XhoI-cut parental homoduplexes in an agarose gel andpurified by excision of the band and purification with a DNA spincolumn.

The resulting population of heteroduplexed molecules was treated withDNA ligase to convert the linear DNA into circular, double-stranded DNAheteroduplexes. After confirmation by agarose gel-shift analysis, thecircular double-stranded GFP heteroduplexed plasmid was used assubstrate for GRAMMR reactions. Examples of the resulting clones areincluded as SEQ ID NO:05, SEQ ID NO:06, SEQ ID NO:07, and SEQ ID NO:08.

EXAMPLE 6 Exemplary Reaction Parameters for Genetic Reassortment by DNAMismatch Resolution CEL I and T4 DNA Polymerase Concentrations Compared

The GRAMMR reaction involves the interaction of numerous enzymaticactivities. Several parameters associated with the GRAMMR reaction wereexamined, such as CEL I concentration, T4 DNA polymerase concentration,reaction temperature, substitution of T4 DNA polymerase with T7 DNApolymerase, the presence of Taq DNA polymerase, and the source of theCEL I enzyme. A matrix of three different CEL I concentrations versustwo concentrations of T4 DNA polymerase was set up to examine the limitsof the in vitro DNA mismatch resolution reaction.

Twenty-one nanograms (21 ng) of the circular double-strandedheteroduplexed plasmid, prepared as described above, was used assubstrate in a series of ten microliter reactions containing 1×NEBligase buffer, 0.5 mM each dNTP, 1.0 unit Taq DNA polymerase, 0.2 unitsT4 DNA ligase (Gibco/BRL), either 1.0 or 0.2 units T4 DNA polymerase,and either 0.3, 0.1, or 0.03 microliters of a CEL I preparation(fraction 5, described in Example 1). Six reactions representing all sixcombinations of the two T4 DNA polymerase concentrations with the threeCEL I concentrations were prepared, split into equivalent sets of fivemicroliters, and incubated at either 20 degrees C. or 37 degrees C. Acontrol reaction containing no CEL 1 and 0.2 unit of T4 DNA polymerasewith the other reaction components was prepared and incubated at 37degrees C. After 30 minutes, one microliter aliquots of each reactionwere transformed into competent DH5-alpha E. coli which were then platedon LB amp plates. Colonies were picked and cultured. Plasmid DNA wasextracted and examined by restriction fragment length polymorphismanalysis (RFLP) followed by sequence analysis of the GFP gene sequences.RFLP analysis was based on differences in several restriction enzymerecognition sites between the wild-type and Cycle 3 GFP genes. The RFLPresults showed that throughout the CEL I/T4 DNA polymerase/temperaturematrix, reassortment of restriction sites, that is GRAMMR, had occurred,and that no such reassortment had occurred in the zero CEL I controlclones. DNA sequence analysis confirmed that reassortment had occurredin all of the CEL 1-containing samples. Sequencing also confirmed thatthe zero- CEL I controls were not reasserted, with the exception of asingle clone of the 16 control clones, which had a single-base changefrom one gene sequence to the other, presumably resulting either fromrepair in E. coli or from random mutation. The sequences of severalexemplary GRAMMR-reassorted GFP clones are shown; all of which came fromthe reaction containing 0.3 microliters of the CEL I preparation and 1.0unit of T4 DNA polymerase incubated at 37 degrees C. The parentalwild-type and Cycle 3 GFP genes are shown first for reference.

EXAMPLE 7 Taq DNA Polymerase is Not Required for Genetic Reassortment byDNA Mismatch Resolution

This experiment teaches that Taq DNA Polymerase does not dramatically,if at all, contribute or interfere with the functioning of GeneticReassortment by DNA Mismatch Resolution (GRAMMR). Taq DNA polymerase isreported to have a 5′ flap-ase activity, and had been included in theteachings of the previous examples as a safeguard against the possibleformation and persistence of undesirable 5′ flaps in the heteroduplexedDNA undergoing GRAMMR.

GRAMMR reactions were set up, as in Example 6, with twenty-one nanogramsof the circular double-stranded heteroduplexed GFP plasmid substrate inten microliter reactions containing 1×NEB ligase buffer, 0.5 mM eachdNTP, 0.2 units T4 DNA ligase, 1.0 unit T4 DNA polymerase, 1.0microliter of a CEL I preparation (fraction 5, described in Example 1),and either 2.5 units, 0.5 units of Taq DNA polymerase, or no Taq DNApolymerase. After 30 minutes, one microliter aliquots of each reactionwere transformed into competent DH5-alpha E. coli which were then platedon LB amp plates. Colonies were picked and cultured. Plasmid DNA wasextracted and examined by RFLP analysis followed by sequence analysis ofthe GFP gene sequences. The RFLP results showed that reassortment ofrestriction sites, that is, GRAMMR, had occurred both in the presenceand the absence of Taq DNA polymerase in the GRAMMR reaction. DNAsequence analysis confirmed these results. Therefore, the data showsthat Taq DNA polymerase was unnecessary for GRAMMR.

EXAMPLE 8 Alternate Proofreading DNA Polymerases for GeneticReassortment by DNA Mismatch Resolution

This experiment teaches that Genetic Reassortment by DNA MismatchResolution is not limited to the use of T4 DNA polymerase, and thatalternate DNA polymerases can be substituted for it.

Reactions were set up, as in Example 6, with twenty-one nanograms of thecircular double-stranded heteroduplexed GFP plasmid substrate in tenmicroliter reactions containing 1×NEB ligase buffer, 0.5 mM each dNTP,0.2 units T4 DNA ligase (Gibco/BRL), 10 units or 2 units of T7 DNApolymerase, 1.0 microliter of a CEL I preparation (fraction 5, describedin Example 1), and 2.5 units of Taq DNA polymerase. After 30 minutes,one microliter aliquots of each reaction were transformed into competentDH5-alpha E. coli which were then plated on LB amp plates. Colonies werepicked and cultured. Plasmid DNA was extracted and examined by RFLPanalysis followed by sequence analysis of the GFP gene sequences. TheRFLP results showed that reassortment of restriction sites, that isGRAMMR, had occurred in both T7 DNA polymerase-containing reactions. DNAsequence analysis confirmed these results. Therefore, the data showsthat T7 DNA polymerase can substitute for T4 DNA polymerase for GRAMMR.In addition, it shows that individual components and functionalities canbe broadly substituted in GRAMMR, while still obtaining similar results.

EXAMPLE 9 Use of Cloned CEL I in the GRAMMR Reaction

This example teaches that CEL I from a cloned source can be used inplace of native CEL I enzyme purified from celery in GeneticReassortment By DNA Mismatch Resolution without any noticeable change inresults.

The cDNA of CEL I was cloned from celery RNA. The gene was inserted intoa TMV viral vector and expressed. Transcripts of the construct were usedto infect Nicotiana benthamiana plants. Infected tissue was harvested,and the CEL I enzyme was purified. The GRAMMR results obtained using thepurified enzyme were compared to those using CEL I purified from celery,and were found to be similar.

Reactions were set up using twenty-one nanograms of the circulardouble-stranded heteroduplexed GFP plasmid substrate in ten microliterscontaining 1×NEB ligase buffer, 0.5 mM each dNTP, 0.2 units T4 DNAligase (Gibco/BRL), 1 unit of T4 DNA polymerase, and either 1.0microliter of CEL I purified from celery (fraction 5, described inExample 1), or 0.3 microliters of CEL I purified from a cloned source.After 30 minutes, one microliter aliquots of each reaction weretransformed into competent DH5-alpha E. coli which were then plated onLB amp plates. Colonies were picked and cultured. Plasmid DNA wasextracted and examined by RFLP analysis followed by sequence analysis ofthe GFP gene sequences. The RFLP results showed that reassortment ofrestriction sites, that is, GRAMMR had occurred in both celery-derivedCEL I, as well as cloned CEL I-containing reactions. DNA sequenceanalysis confirmed these results. Therefore, the data shows CEL I from acloned source can be used in lieu of CEL I from celery for GRAMMR. Inaddition, the data demonstrates that it is CEL I activity that is partof the GRAMMR method, rather than a coincidental effect resulting fromthe purifying steps used in extracting CEL I from celery.

EXAMPLE 10 Molecular Breeding of Tobamovirus 30K Genes in a Viral Vector

In the preceding examples, Genetic Reassortment by DNA MismatchResolution has been taught to be useful for reasserting sequences thatare highly homologous, for example, wtGFP and Cycle 3 GFP are 96%identical. The present example teaches that GRAMMR can be used toreassort more divergent nucleic acid sequences, such as genes encodingtobamovirus movement protein genes.

Heteroduplexes of two tobamovirus movement protein (MP) genes that areapproximately 75% identical were generated. The heteroduplex substratewas prepared by annealing partially-complementary single-stranded DNAsof opposite strandedness synthesized by asymmetric PCR; one strandencoding the movement protein gene from the tobacco mosaic virus U1 typestrain (TMV-U1) (SEQ ID NO:10), and the other strand encoding themovement protein gene from tomato mosaic virus (TOMV) (SEQ ID NO:09).The sequences of the two partially complementary movement protein geneswere flanked by 33 nucleotides of absolute complementarity to promoteannealing of the DNAs at their termini and to facilitate PCRamplification and cloning. The annealing reaction took place by mixing2.5 micrograms of each single-stranded DNA in a 150 microliter reactioncontaining 333 mM NaCl, 33 mM MgCl2, 3.3 mM dithiothreitol, 166 mMTris-HCl, pH 7, and incubating at 95° C. for one minute followed by slowcooling to room-temperature. GRAMMR was performed by incubating 5microliters of the heteroduplex substrate in a 20 microliter reactioncontaining 1×NEB ligase buffer, 0.5 mM each dNTP, 0.4 units T4 DNAligase (Gibco/BRL), 2.0 units of T4 DNA polymerase, and CEL I. The CEL Iwas from a cloned preparation and the amount that was used varied from 2microliters of the prep, followed by five serial 3-fold dilutions. Aseventh preparation with no CEL I was prepared, which served as acontrol.

After one hour at room-temperature, DNA was purified from the reactionsusing Strataprep spin DNA purification columns (Stratagene, LaJolla,Calif.) and used as templates for PCR reactions using primers designedto anneal to the flanking primer-binding sites of the two sequences. PCRproducts from each reaction were purified using Strataprep columns,digested with AvrII and PacI, and ligated into the movement protein slotof similarly-cut pGENEWARE-MP-Avr-Pac. This plasmid contained afull-length infectious tobamovirus-GFP clone modified with AvrII andPacI sites flanking the movement protein gene to permit its replacementby other movement protein genes. After transformation of DH5-alpha E.coli and plating, colonies were picked, cultures grown, and DNA wasextracted. The movement protein inserts were subjected to DNA sequenceanalysis from both directions and the sequence data confirmed that inthe majority of inserts derived from the GRAMMR-treated material werereasserted sequences made up of both TMV-U1 and TOMV movement proteingene sequences. The DNA sequences of several exemplary GRAMMR MP clonesare shown as SEQ ID NO:11, SEQ ID NO:12, SEQ ID NO:13, SEQ ID NO:14, andSEQ ID NO:15.

EXAMPLE 11 GRAMMR Reassortment to Generate Improved Arsenate DetoxifyingBacteria

Arsenic detoxification is important for mining ofarsenopyrite-containing gold ores and other uses, such as environmentalremediation. Plasmid pGJ103, containing an arsenate detoxificationoperon (Ji and Silver, 1992) (Ji, G. and Silver, S., Regulation andexpression of the arsenic resistance operon from Staphylococcus aureusplasmid pI258, J. Bacteriol. 174, 3684-3694 (1992), incorporated hereinby reference), is obtained from Prof. Simon Silver (U. of Illinois,Chicago, Ill.). E. coli TG1 containing pGJ103, containing the pI258 arsoperon cloned into pUC19, has a MIC (minimum inhibitory concentration)of 4 pg/ml on LB ampicillin agar plates. The ars operon is amplified bymutagenic PCR [REF], cloned into pUC19, and transformed into E. coliTG1. Transformed cells are plated on a range of sodium arsenateconcentrations (2, 4, 8, 16 mM). Colonies from the plates with thehighest arsenate levels are picked. The colonies are grown in a mixedculture with appropriate arsenate selection. Plasmid DNA is isolatedfrom the culture. The plasmid DNA is linearized by digestion with arestriction endonuclease that cuts once into the pUC19 plasmid backbone.The linearized plasmids are denatured by heating 10 min. at 94° C. Thereaction is allowed to cool to promote annealing of the single strands.Partially complementary strands that hybridize have non-base pairednucleotides at the sites of the mismatches. Treatment with CEL I(purified by the method of Example 9) causes nicking of one or the otherpolynucleotide strand 3′ of each mismatch. The presence of a polymerasecontaining a 3′-to-5′ exonuclease (“proofreading”) activity, such as T4DNA polymerase allows excision of the mismatch, and subsequent 5′-to-3′polymerase activity fills in the gap using the other strand as atemplate. T4 DNA ligase then seals the nick by restoring the phosphatebackbone of the repaired strand. The result is a randomization ofmutations among input strands to give output strands with potentiallyimproved properties. These output polynucleotides are transformeddirectly into E. coli TG1 and the cells are plated at higher arsenatelevels; 8, 16, 32, 64 mM. Colonies are picked from the plates with thehighest arsenate levels and another round of reassortment is performedas above except that resulting transformed cells are plated at 32, 64,128, 256 mM arsenate. The process can then be repeated one or more timeswith the selected clones in an attempt to obtain additionalimprovements.

EXAMPLE 12 Cloning, Expression and Purification of CEL I Endonuclease

This example teaches the preparation of nucleic acid molecules that wereused for expressing CEL I endonuclease from plants, identified hereinas, p1177MP4-CELI Avr, and p1177MP4-CELI 6HIS. In particular, thisexample refers to disclosures taught in U.S. Pat. Nos. 5,316,931,5,589,367, 5,866,785, and 5,889,190, incorporated herein by reference.

The aforementioned clones were deposited with the American Type CultureCollection, Manassas, Va. 20110-2209 USA. The deposits were received andaccepted on Dec. 13, 2001, and assigned the following Patent DepositDesignation numbers, PTA-3926 (p1177MP4-celI Avr), and PTA-3927(p1177MP4-celI 6HIS).

1. Celery RNA Extraction:

Celery was purchased from a local market. Small amounts of celery tissue(0.5 to 0.75 grams) were chopped, frozen in liquid nitrogen, and groundin a mortar and pestle in the presence of crushed glass. After additionof 400 microliters of Trizol and further grinding, 700 microliters ofthe extract were removed and kept on ice for five minutes. Two hundredmicroliters of chloroform were then added and the samples werecentrifuged, left at room temperature for three minutes, andre-centrifuged at 15,000 g for 10 minutes. The aqueous layer was removedto a new tube and an equal volume of isopropanol was added. Tubes wereinverted to mix and left at room temperature for 10 minutes followed bycentrifugation at 15,000 g for ten minutes at 4° C. The pellet waswashed twice in 400 microliters of 70% ethanol, once in 100% ethanol,air dried, and resuspended in 40 microliters of distilled water. Onemicroliter of RNasin was added and 3.5 microliters was run on a 1%agarose gel to check the quality of the RNA prep (Gel picture). Theremainder was stored at −70° C. until further use.

2. CEL I Gene Cloning and Expression by a Viral Vector:

The total RNA from celery was subjected to reverse transcriptionfollowed by PCR to amplify the cDNA encoding the CEL I gene sequence. Inseparate reactions, eleven microliters of the total celery RNA prep wasmixed with one microliter (50 picomoles) of either CelI-Avr-R,CelI-6H-R, or with two microliters of oligo dT primer. CelI-Avr-R wasused to prime cDNA and amplify the native CEL I sequence at the 3′ endof the gene, while CelI-6H-R was used to add a sequence encoding linkerpeptide and a 6-His tag to the 3′ terminus of the CEL I gene. Thesamples were heated to 70° C. for one minute and quick-chilled on iceprior to the addition of 4 microliters of 5× Superscript II buffer, twomicroliters of 0.1M DTT, 1 microliter of 10 mM each dNTP, and 1microliter of Superscript II (Gibco/BRL) to each reaction. The reactionswere incubated at 42° C. for one hour.

PCR amplification of the CEL I cDNA sequence was performed using themethod of W. M. Barnes (Proc Natl Acad. Sci. USA, 1994 Mar. 15;91(6):2216-20) with a Taq-Pfu mixture or with Pfu alone. The RT reactionprimed with CelI-Avr-R was used as template for a PCR using primersCelI-Pac-F (as the forward primer) paired with CelI-Avr-R (as thereverse primer). In other PCRs, the RT reaction that was primed witholigo dT was used as template for both of the above primer pairs. AllPCR reactions were performed in 100 microliters with 30 cycles ofannealing at 50° C. and two minutes of extension at 72° C. Aliquots ofthe resulting reactions were analyzed by agarose gel electrophoresis.Reactions in which Pfu was used as the sole polymerase showed noproduct. All reactions performed with the Taq/Pfu mixtures yieldedproduct of the expected size. However, those amplified from cDNA primedwith Cel I specific primer pairs gave more product than reactionsamplified from cDNA primed with oligo-dT. DNAs from the PCR reactionsthat gave the most product were purified using a Zymoclean DNA spincolumn kit and digested with PacI and AvrII, gel-isolated, and ligatedinto PacI and AvrII-digested plasmid pRT130, a tobamovirus-basedGENEWARE® vector. 2 microliters of each ligation were transformed intoDH5α competent E. coli and cultured overnight on LB-amp agar plates.Colonies were picked and grown overnight in liquid culture, and plasmidDNA was isolated using a Qiagen plasmid prep kit. 12 clones from eachconstruct were screened by digestion with PacI and AvrII and 11 of 12 ofeach set were positive for insert of the correct size. Ten of the clonesfor each construct were transcribed in-vitro and RNA was inoculated toN. benthamiana plants. In addition, the CEL I gene inserts in both setsof ten clones were subjected to sequence analysis. Several clonescontaining inserts encoding the native form of CEL I had sequenceidentical to the published CEL I sequence in WO 01/62974 A1. One clonecontaining an insert encoding CEL I fused to a 6-Histidine sequence wasidentical to the published CEL I sequence. One clone of each(pRT130-celI Avr-B3 and pRT130-celI 6His-A9, respectively) was selectedfor further work. The CEL I-encoding sequences in these clones weresubsequently transferred to another GENEWARE vector. It should be notedthat applicant's designations for each of the clones were shortened inthe deposit to the aforementioned deposit with the American Type CultureCollection, that is, p1177MP4-celI Avr-B3 is referred to asp1177MP4-celI Avr; and p1177MP4-celI 6His-A9 is referred to asp1177MP4-celI 6His. The clone p1177MP4-celI Avr contained the CEL I openreading frame extending from nucleotide 5765 to 6655; and the clonep1177MP4-celI 6His-A9 contained the CEL I open reading frame extendingfrom nucleotide 5765-6679.

3. Assay of Cloned CEL I Activities.

To determine whether the GENEWARE constructs containing Cel I sequencescould produce active CEL I enzyme, samples of pRT130-celI Avr andpRT130-celI 6His, and GFP-GENEWARE control-infected plants wereharvested and homogenized in a small mortar and pestle in Tris-HCl at pH8.0. Extracts were clarified and assayed for supercoiled DNA nickingactivity. Each supercoiled DNA nicking assay was performed in a reactioncontaining 0.5 micrograms of a supercoiled plasmid prep of apUC19-derivative in 1×NEB ligase buffer in a total volume of 10microliters. The amounts of plant extract added to the reactions were0.1 microliter, 0.01 microliter, or 0.001 microliter, incubated at 42°C. for 30 minutes, and run on a 1% TBE-agarose gel in the presence ofethidium bromide. Little or no nicking activity was detected in theGFP-GENEWARE control-infected plant extract whereas extracts from plantsinfected with the CEL I-GENEWARE constructs showed appreciable amountsof activity against the plasmid DNA substrate.

Additional activity assays were performed on extracts of plantsinoculated with pRT130-celI Avr-B3 and pRT130-celI 6His-A9. In theseassays, intracellular fluid was washed from infected leaves and assayedseparately from material obtained from the remaining washed leaftissues. Assays were performed as described above with the exceptionthat the incubation was at 37° C. for one hour. Samples were run on a 1%TBE-agarose gel in the presence of ethidium bromide and photographed.

4. Purification of 6His-Tagged CEL I from Infected N. benthamianaPlants.

N. benthamiana plants were inoculated with RNA transcripts frompRT130-celI 6His-A9 at 20-21 days post-sowing. Tissues were harvestedfrom 96 infected plants at 10 days post-inoculation and subjected tointracellular fluid washes. Briefly, infected leaf and stem material wasvacuum infiltrated for 30 seconds twice with chilled infiltration buffer(50 mM phosphate pH 4 in the presence of 7 mM D-ME). Infiltrated tissueswere blotted to adsorb excess buffer and secreted proteins wererecovered by centrifugation at 2500×g for 20 min using basket rotor(Beckman). PMSF was added to the extracted intracellular fluid (IF)containing recombinant CEL_I to a final concentration of 1 mM, andincubated at 25° C. for 15 min with stirring. After addition ofImidazole (pH 6.0) and NaCl to the extract to the final concentration of5 mM and 0.5 M respectively, IF was adjusted to pH 5.2 and filteredthrough 1.2μ Sartorius GF membrane (Whatman) to remove most of theRubisco and green pigments. Immediately after clarification, pH wasadjusted to 7.0 using concentrated NaOH solution and incubated on icefor 20 min to allow non-proteinaceous material to precipitate. IF wasfurther clarified using 0.8μ or 0.65/0.45μ Sartorius GF (Whatman).Recombinant CEL I was purified from the clarified IF by metal chelatingaffinity chromatography using Ni²⁺ Fast Flow Sepharose (AmershamPharmacia Biotech, NJ) equilibrated with binding buffer (50 mMphosphate, 0.5 M NaCl; pH 7.0) containing 5 mM imidazole, with a linearvelocity of 300 cm/hr. Unbound protein was washed with 20 mMimidazole/binding buffer, and CEL I was eluted from Ni²⁺ Sepharose witha linear gradient of 20 to 400 M imidazole in the binding buffer.Fractions still containing imidazole were assayed for supercoiled DNAnicking activity as described above but were found to have negligibleactivity. The same fractions were then dialyzed against 0.1 M Tris-HCl,pH 8.0 in the presence of ZnCl₂ using 10 kD MWCOF dialysis tubing(Pierce) and assayed again. The supercoiled DNA nicking activity wasrestored after this dialysis.

IF and purified CEL-I protein were analyzed using Sodium Dodecyl SulfatePolyacrylamide Gel Electrophoresis (SDS-PAGE) precast Tris-glycine gels(Invitrogen, Carlsbad, Calif.) in the buffer system of Laemmli with aXcell II Mini-Cell apparatus (Invitrogen, Carlsbad, Calif.). The proteinbands were visualized by Coomassie brilliant blue and by silverstaining. SDS-PAGE Gels were scanned and analyzed using Bio-Rad gelimager.

Mass Spectrometry of Purified CEL I

The average molecular mass of the purified CEL I was determined bymatrix-assisted laser/desorption ionization time-of-flight massspectrometry (MALDI-TOF). An aliquot of CEL I was diluted 1:10 with 50%acetonitrile/water and mixed with sinapinic acid matrix (1:1 v/v) usinga PE Biosystem DE-Pro mass spectrometer. The mass spectrometry wasperformed using an accelerating voltage of 25 kV and in thepositive-linear ion mode.

Mass Spectrometry of Peptides Isolated from Purified CEL I.

CEL I was separated on SDS-PAGE on a 14% gel and stained with Coomassiebrilliant blue. A single homogenous band was visible. This band wasexcised and de-stained completely. Protein was reduced in the presenceof 10 mM DDT in 50% acetonitrile for 30 min at 37° C. and reducedsulfhydro groups were blocked in the presence of 28 mM iodoacetamide in50% acetonitrile for 30 min at 24° C. in absence of light. Gel pieceswere washed with 50% acetonitrile and after partial dehydration, theexcised CEL I band was macerated in a solution of high purity trypsin(Promega). The proteolytic digestion was allowed to continue at 37° C.for 16 h. The resulting peptides were eluted from gel pieces with a 50%acetonitrile and 0.1% tri-fluoro-acetic acid (TFA) concentrated in aSpeedVac. The peptides were analyzed by MALDI-TOF. Mixed tryptic digestswere crystallized in a matrix of α-cyano-4-hydroxycinnamic acid andanalyzed by using a PerSeptive Biosystem DE-STR MALDI-TOF massspectrometer equipped with delayed extraction operated in thereflector-positive ion mode and accelerating voltage of 20 kV. Expectedtheoretical masses were calculated by MS-digest (Protein Prospector) orGPMAW program (Lighthouse Data, Odense, Denmark). For tandem massspectrometry (nano electrospray ionization (ESI), peptide samples werediluted with 5% acetonitrile/0.1% formic acid and subjected to LC MS/MS,analyzed on a quadrupole orthogonal time-of-flight mass spectrometryinstrument (Micromass UK Ltd., Manchester, UK). The data were processedby Mslynx and the database was searched by Sonar.

Virally expressed, recombinant CEL I was secreted to the IF. ClarifiedIF-extracted material was used to purify the His-tag CEL I activity. CELI was purified using one step Ni²⁺ affinity chromatography separation. Ahighly purified homogeneous single protein band was purified asdetermined by Coomassie stained SDS-PAGE and mass spectrometry. The sizeof mature proteins and percent glycosylation concur with what has beenreported for the CEL I protein isolated from celery (Yang et al., 2000).The purified CEL I has an average molecular mass of 40 kD as determinedby MALDI-TOF mass spectrometry, indicates 23.5% glycosylation by mass.CEL I has four potential glycosylation cites at amino acid positions 58,116, 134, and 208. A mono-isotopic mass of 2152.6086 (2152.0068Theoretical) Da corresponding to the mass of the peptide 107-125(K)DMCVAGAIQNFTSQLGHFR(H) that was recovered by MALDI-TOF, indicatesthat asparagine 116 is not glycosylated. Together, these gel analysesand mass spectrometry data indicate that a significant fraction of theCEL I protein was recoverable from the intracellular space, and that theprotein was correctly processed in the N. benthamiana plant.

For subsequent experiments, the 6-His tagged CEL I gene was producedusing p1177MP4-celI 6His-A9. This clone was transcribed and inoculatedonto N. benthamiana plants, which were harvested 8 days post infection.The plant material was combined with 2 volumes of extraction buffer (500mM NaCl, 100 mM NaPi, 25 mM Tris pH 8.0, 7 mM Beta-mercaptoethanol, 2 mMPMSF) and vacuum infiltrated. Following buffer infiltration the tissuewas macerated in a juice extractor, the resulting green juice adjustedto 4% w/v polyethyleneglycol, and let stand at 4° C. for one hour. Thegreen juice was clarified by either centrifugation at low speed (3500×g)for 20 minutes or combined with perlite (2% w/v) and filtered through a1.2 μm filter. The tagged CEL I can be selectively purified from theclarified green juice by metal affinity chromatography. The green juicewas either combined with nickel-NTA resin, and batch binding of the CELI performed, or purification was performed in column format, where thegreen juice was permitted to flow through a bed of nickel-NTA resin. Forbinding, the clarified green juice was adjusted to 10% w/v glycerol and10 mM imidazole. Following binding the resin was washed extensively withwash buffer (330 mM NaCl, 100 mM NaPi, pH 8.0, 10 mM imidazole) and thebound CEL I enzyme eluted from the nickel-NTA resin in 2 resin-bedvolumes of 1× phosphate-buffered saline (PBS) containing 400 mMimidazole. The CEL I preparation was subsequently dialyzed against 1×PBSto remove the imidazole, assayed for activity, and stored at 4° C. or at−20° C. with or without glycerol until use.

EXAMPLE 13 Cloning, Expression and Use of RES I Endonuclease

This example teaches the construction of a cDNA library from Selaginellalepidophylla, the identification of a nucleic acid sequence from thelibrary that encodes an endonuclease, and the expression of the newendonuclease, herein designated as “RES I.”

RNA was extracted from tissues of the resurrection plant, Selaginellalepidophylla, using the Trizol method, and oligo-dT primed cDNA that wasprepared using standard methodology. Resulting cDNAs were ligated into aGENEWARE-based cloning vector and the ligation products were transformedinto competent E. coli cells. Bacterial colonies containing GENEWAREcDNA clones were picked at random and grown as liquid cultures prior toDNA prepping and determination of the cloned cDNA sequences. Thesequence files for the cloned Selaginella cDNAs were loaded into adatabase which was then searched by BLAST (Basic Local Alignment SearchTool) analysis for sequences that had similarity to the DNA sequence ofthe CEL I gene. BLAST analysis was also performed on other DNA sequencedatabases containing sequences of cDNAs obtained from other species.

BLAST hits that showed some level of homology to the celery CEL Isequence were identified in libraries from several species and thecorresponding GENEWARE-cDNA clones were re-arrayed into a single set ofGENEWARE-cDNA clones. This set of cDNA clones was then transcribed invitro to generate infectious GENEWARE transcripts which were theninoculated onto leaves on Nicotiana benthamiana plants for expressionanalysis of the cDNA sequences encoded within the GENEWARE viral genome.At seven days post-inoculation, leaf samples were taken from theinfected plants and homogenized in two volumes of water. The extractswere then assayed for supercoiled DNA nicking and cleavage activity.

Each supercoiled DNA nicking assay was performed in a reactioncontaining 0.5 micrograms of a supercoiled plasmid prep of apUC19-derivative in 1×NEB T4 DNA ligase buffer in a total volume of 10microliters. The amounts of plant extract added to the reactions were 1microliter, 0.33 microliter, or 0.011 microliter, incubated at 37° C.for 30 minutes, and run on a 1% TAE-agarose gel in the presence ofGelstar fluorescent DNA staining reagent. Little or no nicking activitywas detected in uninfected plant extracts whereas only extracts fromplants infected with GENEWARE constructs containing cDNAs for a singlegene from Selaginella lepidophylla showed appreciable amounts ofactivity against the plasmid DNA substrate.

A sample of the aforementioned Selaginella lepidophylla gene, as shownin SEQ ID NO:16, was mailed to the American Type Culture Collection,Manassas, Va. 20110-2209 USA on Jul. 29, 2002. The deposit was receivedand accepted on Jul. 30, 2002, and assigned the following Patent DepositDesignation number, PTA-4562.

The complete gene sequences of these clones were determined and PCRprimers were designed to amplify the open reading frame minus anynon-coding 5′ and 3′ sequences and to add a six histidine tail to theC-terminus of the encoded protein. The primers were then used to amplifythe ORF from one of the active full-length Selaginella clones. Theresulting PCR product was then cloned into the GENEWARE vector pDN4between the PacI and AvrII sites for expression in planta. The resultingclone, pLSB2225, was sequenced to confirm that the gene had beeninserted correctly, and then transcribed in vitro followed byinoculation of the infectious transcripts onto N. benthamiana plants.Seven days post inoculation, infected plant extracts were made as aboveand assayed for supercoiled DNA nicking and digestion activity toconfirm the activity of the cloned enzyme.

Each supercoiled DNA nicking assay was performed in a reactioncontaining 0.5 micrograms of a supercoiled plasmid prep of apUC19-derivative in 1×NEB E. coli DNA ligase buffer in the presence of50 mM KCl in a total volume of 10 microliters. The amounts of plantextract added to the reactions were 0.2 microliter, 0.04 microliter,0.008 microliter, or 0.0016 microliter, incubated at 37° C. for 30minutes, and run on a 0.8% TAE-agarose gel in the presence of Gelstarfluorescent DNA staining reagent. Little or no nicking activity wasdetected in uninfected plant extracts whereas extracts from plantsinfected with the GENEWARE-Selaginella construct pLSB2225 showedappreciable amounts of activity against the plasmid DNA substrate.

After positive results were obtained in that assay, extracts of pLSB2225infected plants were used in a GRAMMR experiment to test the ability ofthis enzyme to operate as a component of the mismatch resolutionreaction in place of the GENEWARE-produced CEL I enzyme of celeryorigin.

EXAMPLE 14 Use of RES I in the GRAMMR Reaction

This example teaches that RES I can be used in place of native CEL Ienzyme purified from celery in Genetic Reassortment By DNA MismatchResolution without any noticeable change in results.

GRAMMR was performed between the wild-type Aequorea victoria GFP gene(Prasher, et al., Gene111(92)229) in a pBS derivative (Stratagene, LaJolla, Calif.) encoded by pBSWTGFP (SEQ ID NO:03) and a variant withmutations to increase fluorescence intensity in E. coli, and to alterthe emission wavelength to blue light emission (Crameri, et al., NatBiotechnol 14(96)315; Heim et al., PNAS 91(94)12501; Yang, et al., JBiol Chem 273(98)8212). This variant gene, encoded by the plasmidpBSC3BFP, as shown in SEQ ID NO:18, encodes a fluorescent protein thatemits bright blue light when excited by longwave UV light.

The GRAMMR reactions were performed on GFP/c3BFP heteroduplexes in acircular, double-stranded plasmid DNA context. The circular,whole-plasmid heteroduplex DNA substrates were prepared by firstlinearizing pBSWTGFP (SEQ ID NO:03) and pBSC3BFP (SEQ ID NO:18) bydigestion with Kpn I and NgoM IV, respectively, then purifying thedigested DNA using DNA spin columns. Next, 200 nanograms of each of thetwo linearized plasmids were mixed and brought to 1×SSPE (180 nM NaCl,10 mM NaH₂PO₄, 1 mM EDTA at pH 7.4) in a volume of 20 microliters. Themixture was then incubated at 95 degrees Celsius for 4 minutes, plungedinto ice water where it remained for 10 minutes prior to incubation at37 degrees Celsius. After 30 minutes, the annealed DNA sample was thentransferred back to ice where it was held until use in GRAMMR reactions.

Two independent series of reassortment reactions were performed tocompare CEL I with RES I in their abilities to facilitate sequencereassortment by GRAMMR. Each GRAMMR reaction contained 1 unit of T4 DNApolymerase, 2 units of E. coli DNA ligase, and 5 nanomoles of each dNTPin 1×NEB E. coli ligase buffer supplemented with KCl to 50 mM. Twoseparate enzyme dilution series were then performed. To each of twoseries of tubes containing aliquots of the above cocktail, onemicroliter aliquots of GENEWARE-expressed CEL I or RES I extracts atdilutions of 1/3, 1/9, 1/27, 1/81, or 1/243 were added. Anendonuclease-free control reaction was also prepared. To each of thereactions, one microliter aliquots containing 20 nanograms of theannealed DNA heteroduplex substrate were added and the reactionsincubated at room temperature for one hour and on ice for 30 minutesprior to transformation into competent E. coli.

Green fluorescent protein (GFP) and blue fluorescent protein (BFP) couldbe visualized in the resulting colonies by long wave UV illumination.The parental wild-type GFP has dim green fluorescence, and the parentalc3BFP gave bright blue fluorescence. In the genes encoding thesefluorescent proteins, the sequences that determine the emission colorand those that govern fluorescence intensity are at different positionsfrom one another. It is expected that DNA reassortment would result inthe “de-linking” of the sequences that determine the emission color fromthose that govern fluorescence intensity. As a consequence, theresultant progeny would be expected to exhibit reassortment of thefunctional properties of emission color and intensity. Therefore ameasure of the extent of the DNA reassortment that had taken place ineach reaction could be scored by examining the color and intensity offluorescence from the bacterial colonies on the corresponding plates. Inthe zero-nuclease control, only dim green and bright blue colonies wereobserved. However, on plates with cells transformed with DNAs from thereactions containing either CEL I or RES I, some bright green as well assome dim blue colonies were observed, indicating that reassortment ofDNA sequences had taken place. DNA sequence analysis confirmed that thiswas indeed the case and that on average, the recovery of shuffled cloneswas greater than 85% for both CEL I and RES I and that the number anddistribution of information transfer events was similar for bothenzymes. However, it appeared that the activity of RES I in thisexperiment was several-fold higher than that of CEL I, as indicated bythe low transformation efficiency of reactions treated with the higherconcentrations of the RES I preparation.

EXAMPLE 15 Molecular Breeding of Highly Divergent Tobamovirus 30K Genesin Viral Vectors Using Plasmid-on-Plasmid Genetic Reassortment by DNAMismatch Resolution (POP GRAMMR)

Example 10 taught the reassortment of movement protein (MP) genes fromseveral divergent strains of tobamovirus (approximately 75% identical;cloned into the pGENEWARE-MP-Avr-Pac vector) using Genetic Reassortmentby DNA Mismatch Resolution (GRAMMR). This example teaches the use ofPlasmid-on-plasmid GRAMMR (POP GRAMMR) for reasserting even more highlydivergent species.

Starting parental MP genes from the tobamoviruses TMV-Cg (SEQ ID NO:19),TMV-Ob (SEQ ID NO:20), TMV-U2 (SEQ ID NO:21), TMV-U1 (SEQ ID NO:10), andtomato mosaic virus (TOMV) (SEQ ID NO:09) were used. The plasmid ofpGENEWARE-ToMV MP was linearized by digestion with Sma I. The plasmidsof pGENEWARE containing the MP genes from either TMV-Cg, TMV-Ob, TMV-U2,or TMV-U1 were digested with Stu I. The digested pGENEWARE-MP constructswere purified using DNA spin columns. The following heteroduplex pairswere generated: pGENEWARE-Cg MP and pGENEWARE-ToMV MP, pGENEWARE-TMV-ObMP and pGENEWARE-ToMV MP, pGENEWARE-TMV-U2MP and pGENEWARE-ToMV MP,pGENEWARE-TMV-U1 MP and pGENEWARE- TOMV MP. The heteroduplexes of theseMP gene sequences are approximately 47%, 58%, 62%, and 75% identical,respectively. Heteroduplex DNA was generated by mixing 200 nanograms ofeach of the two linearized plasmids in 1×SSPE (180 mM NaCl, 10 mMNaH₂PO₄, 1 mM EDTA, at pH 7.4) in a volume of 20 microliters. Themixture was incubated at 95 degrees Celsius for 4 minutes, plunged intoice water where it remained for 10 minutes prior to incubation at 37degrees Celsius. After 30 minutes, the annealed DNA sample was thentransferred back to ice where it was held until use in GRAMMR reactions.

Each 10 microliter GRAMMR reaction contained 1 unit of T4 DNApolymerase, 2 units of E. coli DNA ligase, and 0.5 mM of each dNTP in1×NEB E. coli DNA ligase buffer supplemented with KCl to 50 mM. A onemicroliter aliquot of CEL I (diluted 1/3, 1/9, 1/27, 1/81, 1/243, or1/729) was next added. An endonuclease-free control reaction was alsoprepared. To each of the reactions, a one microliter aliquot containing20 nanograms of the annealed DNA heteroduplex substrate was added andthe reactions were incubated at room temperature for one hour and on icefor 30 minutes prior to transformation into competent E. coli.

DNA sequence analysis was performed from both directions, and thesequence data showed that a significant number of clones derived fromthe GRAMMR-treated material were reasserted sequences containinginformation from both parental movement protein gene sequences. The DNAsequences of several exemplary GRAMMR pGENEWARE-MP clones are shown asfollows, TMV-Cg/ToMV clones, SEQ ID NO:22 and SEQ ID NO:23; TMV-Ob/ToMVclones, SEQ ID NO:24 and SEQ ID NO:25; TMV-U2/ToMV clones, SEQ ID NO:26and SEQ ID NO:27; and TMV-U1/ToMV clones, SEQ ID NO:28 and SEQ ID NO:29.

EXAMPLE 16 Homologies to RES I Endonuclease

The amino acid sequence of RES I was compared to a database by BLAST(Basic Local Alignment Search Tool) analysis employing standard defaultparameters for sequences that had similarity to the RES I sequence.

In terms of the percentage of sequence identity of RES I compared toother enzymes by BLAST score (particularly CEL I, ZEN 1, BFN 1 andDSA6), the closest % identities found for the mature amino acid sequence(leader sequence removed) include 52% identity to CEL I (gi 7229711, gbAAF42954.1), 50% identity to endonuclease ZEN 1 from Zinnia elegans (gi3242447, dbj BAA28948.1), 51% identity to bifunctional nuclease BFN1 ofArabidopsis thaliana (gi 21594913, gb AAM65931.1), 50% identity tosenescence-associated protein 6 (DSA6) of Hemerocallis hybrid cultivar(gi 3551956, gb AAC34856.1), 55% identity to the bifunctional nucleaseof Zinnia elegans (gi 4099835, gb AAD00695.1), 51% identity to putativeendonuclease precursor from Lycopersicon esculentum (gi 114144725, embCAJ87709.1), and 53% identity to Os0g0128100, Oryza sativa (gi113531440, dbj BAF03823.1).

PCT publication WO 01/62974 describes a nucleic acid encoding themismatch endonuclease CEL I, and a DNA mismatch detection method inwhich CEL I is used to cleave DNA at a mismatch. They also describe themethod using mismatch endonucleases having greater than 60% sequenceidentity to CEL I. They do not teach a method of generating sequencevariants. No teaching is made of the mismatch endonuclease RES I, or ofmethods using RES I. The amino acid sequence of RES I is about 48%identical to CEL I with the leader sequences intact and about 52%identical to CEI I when comparing the amino acid sequences of the matureenzymes.

Enzymes such as CEL I and RES I appear to belong to a family of enzymesinvolved in the degradation of nucleic acids during senescence. Forexample, BFN1 is induced during leaf and stem senescence in Arabidopsis(see Perez-Amador et al. Plant Phys 122 (2000)169). Senescence is animportant phase in the plant life cycle that is thought to contribute tofitness through recycling of nutrients to actively growing regions(Buchanan-Wollaston, V., J Exp Bot 48 (1997)181). BFN1 has about 72%identity to CEL 1 and 51% identity to RES I when the mature forms of theenzymes are compared by BLAST analysis.

EXAMPLE 17 Determination of Mismatch Endonuclease Activities Suitablefor Use in the GRAMMR Reaction

Examples 13 and 14 taught that that RES I could be used in place ofnative CEL I enzyme purified from celery in Genetic Reassortment By DNAMismatch Resolution. Other enzymes can be evaluated for theirsuitability for use as a mismatch endonuclease in the GRAMMR reactionusing a similar approach. For example, an isolated protein havingmismatch endonuclease activity comprising an amino acid sequence that isat least 60% identical to SEQ ID NO:17 as determined by BLAST analysiscan be used, and its mismatch endonuclease activity confirmed bycomparison with CEL I or RES I as in Example 14. Suitable candidateenzymes can be isolated from natural sources, or novel sequences can begenerated artificially by any appropriate mutagenesis method. Suitablecandidate enzymes can also be generated using DNA shuffling orreassortment technologies including GRAMMR.

1. An isolated protein having mismatch endonuclease activity comprising an amino acid sequence that is at least 60% identical to SEQ ID NO:17 as determined by BLAST analysis.
 2. The protein of claim 1 wherein the amino acid sequence is at least 65% identical to SEQ ID NO:17 as determined by BLAST analysis.
 3. The protein of claim 1 wherein the amino acid sequence is at least 70% identical to SEQ ID NO:17 as determined by BLAST analysis.
 4. The protein of claim 1 wherein the amino acid sequence is at least 80% identical to SEQ ID NO:17 as determined by BLAST analysis.
 5. The protein of claim 1 wherein the amino acid sequence is at least 90% identical to SEQ ID NO:17 as determined by BLAST analysis.
 6. The protein of claim 1 wherein the amino acid sequence is at least 95% identical to SEQ ID NO:17 as determined by BLAST analysis. 